Abstract |
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1..m] on a large repetitive text collection T[1..n] over an alphabet of size sigma, which is represented as a (hopefully much smaller) run-length context-free grammar of size g_rl. We show that the problem can be solved in time O(m^2 log^e n), for any constant e > 0, on a data structure of size O(g_rl). Further, on a locally consistent grammar of size O(delt log (n log s / (delta log n))), the time decreases to O(m log m (log m + log^e n)). The value delta is a function of the substring complexity of T and Omega(delta log (n log s / (delta log n))) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n, s, and delta. We extend our results to several related problems, such as finding k-MEMs, MUMs, rare MEMs, and applications. |