Publications

Stats

View publication

Title Text Indexing for Simple Regular Expressions
Authors Hideo Bannai, Philip Bille, Inge Li Gortz, Gad Landau, Gonzalo Navarro, Nicola Prezza, Teresa Anna Steiner, Simon Rumle Tarnow
Publication date 2025
Abstract We study the problem of indexing a text T[1..n] in Sigma^n so that, later, given a query regular expression pattern R of size m = |R|, we can report all the occ substrings T[i..j] of T matching R. The problem is known to be hard for arbitrary patterns R, so in this paper, we consider the following two types of patterns. (1) Character-class Kleene-star patterns of the form P1 D^* P2, where P1 and P2 are strings and D = {c_1, ..., c_k} subset Sigma is a character-class (shorthand for the regular expression (c_1 | c_2 | ... | c_k)) and (2) String Kleene-star patterns of the form P1 P^* P2 where P, P1 and P2 are strings. In case (1), we describe an index of O(nlog^{1+e}n) space (for any constant e > 0) solving queries in time O(m + log n/log log n + occ) on constant-sized alphabets. We also describe a general solution for any alphabet size. This result is conditioned on the existence of an anchor: a character of P1P2 that does not belong to D. We justify this assumption by proving that no efficient indexing solution can exist if an anchor is not present unless the Set Disjointness Conjecture fails. In case (2), we describe an index of size O(n) answering queries in time O(m + (occ+1)log^e n) on any alphabet size.
Pages 20:1-20:16
Conference name Annual Symposium on Combinatorial Pattern Matching
Publisher Springer-Verlag (Berlin/Heidelberg, Germany)
Reference URL View reference page