Publications

Stats

View publication

Title Smaller Self-Indexes for Natural Language
Authors Nieves Brisaboa, Gonzalo Navarro, Alberto Ordóñez Pereira
Publication date 2012
Abstract Self-indexes for natural-language texts, where these are regarded as token
(word or separator) sequences, achieve very attractive space and search time.
However, they suffer from a space penalty due to their large vocabulary.
In this paper we show that by replacing the Huffman encoding they implicitly
use by the slightly weaker Hu-Tucker encoding, which respects the lexical
order of the vocabulary, both their space and time are improved.
Downloaded 8 times
Pages 372-378
Conference name International Symposium on String Processing and Information Retrieval
Publisher Springer-Verlag (Berlin/Heidelberg, Germany)
PDF View PDF
Reference URL View reference page