Publications

Stats

View publication

Title Compressing Huffman Models on Large Alphabets
Authors Gonzalo Navarro, Alberto Ordóñez Pereira
Publication date 2013
Abstract A naive storage of a Huffman model on a text of length n over an alphabet of size s requires O(s log n) bits. This can be reduced to s log s + O(s) bits using canonical codes. This overhead over the entropy can be significant when s is comparable to n, and it also dictates the amount of main memory required to compress or decompress. We design an encoding scheme that requires O(s log log n) bits in the worst case, and typically less, while supporting encoding and decoding of symbols in O(log log n) time. We show that our technique reduces the storage size of the model using current techniques to around 15% in various real-life sequences over large alphabets, while still offering reasonable compression/decompression times.
Downloaded 8 times
Pages 381-390
Conference name Data Compression Conference
PDF View PDF
Reference URL View reference page