Publications

Stats

View publication

Title On Index-Free Similarity Search in Metric Spaces
Authors Tomas Skopal, Benjamin Bustos
Publication date 2009
Abstract Metric access methods (MAMs) serve as a tool for speeding similarity
queries. However, all MAMs developed so far are index-based; they need to
build an index on a given database. The indexing itself is either static
(the whole database is indexed at once) or dynamic (insertions/ deletions
are supported), but there is always a preprocessing step needed. In this
paper, we propose D-file, the first MAM that requires no indexing at all.
This feature is especially beneficial in domains like data mining, streaming
databases, etc., where the production of data is much more intensive than
querying. Thus, in such environments the indexing is the bottleneck of the
entire production/querying scheme. The idea of D-file is an extension of the
trivial sequential file (an abstraction over the original database,
actually) by so-called D-cache. The D-cache is a main-memory structure that
keeps track of distance computations spent by processing all similarity
queries so far (within a runtime session). Based on the distances stored in
D-cache, the D-file can cheaply determine lower bounds of some distances
while the distances alone have not to be explicitly computed, which results
in faster queries. Our experimental evaluation shows that query efficiency
of D-file is comparable to the index-based state-of-the-art MAMs, however,
for zero indexing costs.
Pages 516-531
Conference name International Conference on Database and Expert Systems Applications
Publisher Springer-Verlag (Berlin/Heidelberg, Germany)
Reference URL View reference page