Publications

Stats

View publication

Title On Compressing and Indexing Repetitive Sequences
Authors Sebastián Kreft, Gonzalo Navarro
Publication date 2013
Abstract We introduce LZ-End, a new member of the Lempel-Ziv family of text compressors, which achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary text substrings. We then build the first self-index based on LZ77 (or LZ-End) compression, which in addition to text extraction offers fast indexed searches on the compressed text. This self-index is particularly effective for representing highly repetitive sequence collections, which arise for example when storing versioned documents, software repositories, periodic publications, and biological sequence databases.
Pages 115-133
Volume 483
Journal name Theoretical Computer Science
Publisher Elsevier Science (Amsterdam, The Netherlands)
Reference URL View reference page