Title |
Bottom-k Document Retrieval |
Authors |
Gonzalo Navarro, Sharma Thankachan |
Publication date |
2015 |
Abstract |
We consider the problem of retrieving the k documents from a collection of strings where a given pattern P appears least often. This has potential applications in data mining, bioinformatics, security, and big data. We show that adapting the classical linear-space solutions for this problem is trivial, but the compressed-space solutions are not easy to extend. We design a new solution for this problem that matches the best-known result when using 2|CSA| + o(n) bits, where CSA is a Compressed Suffix Array. Our structure answers queries in the time needed by the CSA to find the suffix array interval of the pattern plus O(k lg k lg^e n) accesses to suffix array cells, for any constant e > 0. |
Downloaded |
4 times |
Pages |
69-74 |
Volume |
32 |
Journal name |
Journal of Discrete Algorithms |
Publisher |
Elsevier Science (Amsterdam, The Netherlands) |
PDF |
|
Reference URL |
|