Title |
Faster Top-k Document Retrieval in Optimal Space |
Authors |
Gonzalo Navarro, Sharma Thankachan |
Publication date |
2013 |
Abstract |
We consider the problem of retrieving the k documents from a collection of strings where a given pattern P appears most often. We show that, by representing the collection using a Compressed Suffix Array CSA, a data structure using the asymptotically optimal |CSA|+o(n) bits can answer queries in the time needed by CSA to find the suffix array interval of the pattern plus O(k lg^2 k lg^e n) accesses to suffix array cells, for any constant e>0. This is lg n / lg k times faster than the only previous solution using optimal space, lg k times slower than the fastest structure that uses twice the space, and lg^2 k lg^e n times the lower-bound cost of obtaining k document identifiers from the CSA. To obtain the result we introduce a tool called the sampled document array, which can be of independent interest. |
Downloaded |
8 times |
Pages |
255-262 |
Conference name |
International Symposium on String Processing and Information
Retrieval |
Publisher |
Springer-Verlag (Berlin/Heidelberg, Germany) |
PDF |
|
Reference URL |
|