View publication
Title | Smaller Self-Indexes for Natural Language |
Authors | Nieves Brisaboa, Gonzalo Navarro, Alberto Ordóñez Pereira |
Publication date | 2012 |
Abstract | Self-indexes for natural-language texts, where these are regarded as token (word or separator) sequences, achieve very attractive space and search time. However, they suffer from a space penalty due to their large vocabulary. In this paper we show that by replacing the Huffman encoding they implicitly use by the slightly weaker Hu-Tucker encoding, which respects the lexical order of the vocabulary, both their space and time are improved. |
Downloaded | 8 times |
Pages | 372-378 |
Conference name | International Symposium on String Processing and Information Retrieval |
Publisher | Springer-Verlag (Berlin/Heidelberg, Germany) |
![]() |
|
Reference URL |
![]() |