Publications

Stats

View publication

Title Top-k Ranked Document Search in General Text Databases
Authors J. Shane Culpepper, Gonzalo Navarro, Simon J. Puglisi, Andrew Turpin
Publication date 2010
Abstract Text search engines
return a set of $k$ documents ranked by similarity to a query.
Typically,
documents and queries are drawn from natural
language text, which can readily be partitioned into words, allowing
optimizations of data structures and algorithms for
ranking.
However, in many new search domains (DNA, multimedia, OCR texts,
Far East languages)
there is often no obvious definition of words and traditional
indexing approaches are not so easily adapted, or break down entirely.
We present
two new algorithms for ranking documents against a query without making any
assumptions on the structure of the underlying text.
We build on existing theoretical techniques, which we have
implemented and compared empirically with new approaches introduced in this paper.
Our best approach is significantly faster than
existing methods in RAM, and is
even three times faster than a state-of-the-art
inverted file implementation for
English text when word queries are issued.
Pages 194-205
Conference name Annual European Symposium on Algorithms
Publisher Springer-Verlag (Berlin/Heidelberg, Germany)
Reference URL View reference page