Abstract |
Two recent lower bounds on the compressibility of repetitive sequences, delta ≤ gamma, have received much attention. It has been shown that a length-n string S over an alphabet of size s can be represented within the optimal O(delta log (n log s / delta log n)) space, and further, that within that space one can find all the occ occurrences in S of any length-m pattern in time O(m log n + occ log^e n) for any constant e > 0. Instead, the near-optimal search time O(m + (occ+1) log^e n) has been achieved only within O(gamma log (n/gamma)) space. Both results are based on considerably different locally consistent parsing techniques. The question of whether the better search time could be supported within the delta-optimal space remained open. In this paper, we prove that both techniques can indeed be combined to obtain the best of both worlds: O(m + (occ+1) log^e n) search time within O(delta log (n log s / delta log n)) space. Moreover, the number of occurrences can be computed in O(m + log^(2+e) n) time within O(delta log (n log s / delta log n)) space. We also show that an extra sublogarithmic factor on top of this space enables optimal O(m + occ) search time, whereas an extra logarithmic factor enables optimal O(m) counting time. |