Publications

View publication

Title Inverted Treaps
Authors Roberto Konow, Gonzalo Navarro, Charles L. A. Clarke, Alejandro López-Ortiz
Publication date 2016
Abstract We introduce a new representation of the inverted index that performs faster ranked unions and intersections
while using similar space. Our index is based on the treap data structure, which allows us to intersect/merge
the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step
classical processing methods. To achieve compression, we represent the treap topology using different
alternative compact data structures. Further, the treap invariants allow us to elegantly encode differentially
both document identifiers and frequencies. We also show how to extend this representation to support incremental
updates over the index. Results show that, under the tf-idf scoring scheme, our index uses about the
same space as state-of-the-art compact representations, while performing up to 2-20 times faster on ranked
single-word, union, or intersection queries. Under the BM25 scoring scheme, our index may use up to 40%
more space than the others and outperforms them less frequently but still reaches improvement factors of
2-20 in the best cases. The index supporting incremental updates poses an overhead of 50%-100% over the
static variants in terms of space, construction, and query time.
Pages article 22
Volume 35
Journal name ACM Transactions on Information Systems
Publisher ACM Press (New York, NY, USA)
Reference URL View reference page