Title |
Improved Compressed Indexes for Full-Text Document Retrieval |
Authors |
Djamal Belazzougui, Gonzalo Navarro, Daniel Valenzuela |
Publication date |
2013 |
Abstract |
We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least |CSA| + O(n lg D / lg lg D) or 2|CSA|+o(n) bits of space, where CSA is a full-text index. Using monotone minimal perfect hash functions (mmphfs), we give new algorithms for document listing with frequencies and top-k document retrieval using just |CSA|+O(n lg lg lg D) bits. We also improve current solutions that use 2|CSA|+o(n) bits, and consider other problems such as colored range listing, top-k most important documents, and computing arbitrary frequencies. We give proof-of-concept experimental results that show that using mmphfs may provide relevant practical tradeoffs for document listing with frequencies. |
Pages |
3-13 |
Volume |
18 |
Journal name |
Journal of Discrete Algorithms |
Publisher |
Elsevier Science (Amsterdam, The Netherlands) |
Reference URL |
|