View publication
Title | Repetitiveness Measures Based on String Morphisms |
Authors | Gonzalo Navarro, Cristian Urbina |
Publication date | 2025 |
Abstract |
We study deterministic Lindenmayer systems, and more generally string morphisms, as mechanisms to capture the repetitiveness of string collections. In particular, we define L-systems, which extend CPD0L-systems with two parameters, d and n, to unambiguously determine a single string w = tau(phi^d(s))[1..n], where phi is the morphism of the system, tau is a coding, and s is its starting symbol. We define ell(w) as the size of the shortest description of an L-system generating w, and show that ell, which builds on the self-similarities that arise in the sequence, is a relevant measure of repetitiveness. We study the relation between ell and a better-established measure called delta, which builds on substring complexity. Our results show that ell and delta are largely orthogonal, in the sense that either can be larger than the other, depending on the string family. In particular, ell can be Theta(sqrt(n)) times smaller than delta, whereas ell is upper-bounded by the smallest context-free grammar that generates the string, and thus can be only polylogarithmically larger than delta. This suggests that both mechanisms capture different kinds of regularities related to repetitiveness. \n\n We then combine the mechanisms of L-systems and of bidirectional macro schemes, the smallest known representation scheme that captures substring copies. We call the combination NU-systems, and show that they can be asymptotically strictly smaller than both mechanisms for the same fixed string family. The size nu of the smallest NU-system is a new minimal reachable repetitiveness measure, and it is also shown to be uncomparable with delta. This shows that combining morphism substitution with copy-paste mechanisms yields a stronger approach to capture string repetitiveness. \n\n We also study algorithmic problems on L-systems and NU-systems, such as the cost of decompression and of accessing substrings of the encoded string without fully decompressing it, and explore various simplifications of L-systems that trade compression effectiveness by algorithmic efficiency to handle them. |
Pages | article 115259 |
Volume | 1043 |
Journal name | Theoretical Computer Science |
Publisher | Elsevier Science (Amsterdam, The Netherlands) |
Reference URL |
![]() |