Publications

Stats

View publication

Title Fully-Compressed Suffix Trees
Authors Luís Russo, Gonzalo Navarro, Arlindo Oliveira
Publication date 2011
Abstract Suffix trees are by far the most important data structure in
stringology, with a myriad of applications in fields like
bioinformatics and information retrieval. Classical representations of
suffix trees require $\Theta(n \log n)$ bits of space, for a string of size
$n$. This is considerably more than the $n \log_2\sigma$ bits needed
for the string itself, where $\sigma$ is the alphabet size. The size
of suffix trees has been a barrier to their wider adoption in
practice. Recent compressed suffix tree representations require just
the space of the compressed string plus $\Theta(n)$ extra bits. This
is already spectacular, but the linear extra bits are still unsatisfactory
when $\sigma$ is small as in DNA sequences. In this paper we introduce the
first compressed suffix tree representation that breaks this $\Theta(n)$-bit
space barrier. The Fully Compressed Suffix Tree (FCST) representation
requires only sublinear space on top of the compressed text size, and
supports a wide set of navigational operations in almost logarithmic time.
This includes extracting arbitrary text substrings, so the FCST replaces
the text using almost the same space as the compressed text.
An essential ingredient of FCSTs is the lowest common ancestor (LCA) operation.
We reveal important connections between LCAs and suffix tree
navigation. We also describe how to make FCSTs dynamic, \ie, support updates
to the text. The dynamic FCST also supports several operations. In particular
it can build the static FCST within optimal space and polylogarithmic
time per symbol. Our theoretical results are also validated experimentally,
showing that FCSTs are very effective in practice as well.
Pages article 53
Volume 7
Journal name ACM Transactions on Algorithms
Publisher ACM Press (New York, NY, USA)
Reference URL View reference page