Affiliation:
1. Masaryk University, Brno, Czech Republic
Abstract
This chapter focuses on data searching, which is nowadays mostly based on similarity. The similarity search is challenging due to its computational complexity, and also the fact that similarity is subjective and context dependent. The authors assume the metric space model of similarity, defined by the domain of objects and the metric function that measures the dissimilarity of object pairs. The volume of contemporary data is large, and the time efficiency of similarity query executions is essential. This chapter investigates transformations of metric space to Hamming space to decrease the memory and computational complexity of the search. Various challenges of the similarity search with sketches in the Hamming space are addressed, including the definition of sketching transformation and efficient search algorithms that exploit sketches to speed-up searching. The indexing of Hamming space and a heuristic to facilitate the selection of a suitable sketching technique for any given application are also considered.
Reference64 articles.
1. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions
2. Building a web-scale image similarity search system
3. Batko, M., Gennaro, C., Savino, P., & Zezula, P. (2004). DigitalLibrary Architectures: Peer-to-Peer, Grid, and Service-Orientation, Pre-proceedings of the Sixth Thematic Workshop of the EU Network of Excellence DELOS. Edizioni Libreria Progetto.
4. CoPhIR Image Collection under the Microscope
5. Broder, A. Z. (1997). On the resemblance and containment of documents. Proceedings of Compression and Complexity of SEQUENCES 1997, 21-29.