Publications

Stats

View publication

Title Document Listing on Repetitive Collections
Authors Travis Gagie, Kalle Karhu, Gonzalo Navarro, Simon Puglisi, Jouni Sir'en
Publication date 2013
Abstract Many document collections consist largely of repeated material, and several indexes have been designed to take advantage of this. There has been only preliminary work, however, on document retrieval for repetitive collections. In this paper we show how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document listing. In our experiments, our additional structures on top of the RLCSA can reduce the query time for document listing by an order of magnitude while still using total space that is only a fraction of the uncompressed size of the collection. As a byproduct, we develop a new document listing technique for general collections that is of independent interest.
Pages 107-119
Conference name Annual Symposium on Combinatorial Pattern Matching
Publisher Springer-Verlag (Berlin/Heidelberg, Germany)
Reference URL View reference page