Publications

Stats

View publication

Title Computing MEMs and Relatives on Repetitive Text Collections
Authors Gonzalo Navarro
Publication date 2025
Abstract We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1..m] on a large repetitive text collection T[1..n] over an alphabet of size sigma, which is represented as a (hopefully much smaller) run-length context-free grammar of size g_rl. We show that the problem can be solved in time O(m^2 log^e n), for any constant e > 0, on a data structure of size O(g_rl). Further, on a locally consistent grammar of size O(delt log (n log s / (delta log n))), the time decreases to O(m log m (log m + log^e n)). The value delta is a function of the substring complexity of T and Omega(delta log (n log s / (delta log n))) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n, s, and delta. We extend our results to several related problems, such as finding k-MEMs, MUMs, rare MEMs, and applications.
Pages article 12
Volume 21
Journal name ACM Transactions on Algorithms
Publisher ACM Press (New York, NY, USA)
Reference URL View reference page