Publications

Stats

View publication

Title Repetitiveness Measures Based on String Morphisms
Authors Gonzalo Navarro, Cristian Urbina
Publication date 2025
Abstract We study deterministic Lindenmayer systems, and more generally string morphisms, as mechanisms to capture the repetitiveness of string collections. In particular, we define L-systems, which extend CPD0L-systems with two parameters, d and n, to unambiguously determine a single string w = tau(phi^d(s))[1..n], where phi is the morphism of the system, tau is a coding, and s is its starting symbol. We define ell(w) as the size of the shortest description of an L-system generating w, and show that ell, which builds on the self-similarities that arise in the sequence, is a relevant measure of repetitiveness.
We study the relation between ell and a better-established measure called delta, which builds on substring complexity. Our results show that ell and delta are largely orthogonal, in the sense that either can be larger than the other, depending on the string family. In particular, ell can be Theta(sqrt(n)) times smaller than delta, whereas ell is upper-bounded by the smallest context-free grammar that generates the string, and thus can be only polylogarithmically larger than delta. This suggests that both mechanisms capture different kinds of regularities related to repetitiveness.
\n\n
We then combine the mechanisms of L-systems and of bidirectional macro schemes, the smallest known representation scheme that captures substring copies. We call the combination NU-systems, and show that they can be asymptotically strictly smaller than both mechanisms for the same fixed string family. The size nu of the smallest NU-system is a new minimal reachable repetitiveness measure, and it is also shown to be uncomparable with delta. This shows that combining morphism substitution with copy-paste mechanisms yields a stronger approach to capture string repetitiveness.
\n\n
We also study algorithmic problems on L-systems and NU-systems, such as the cost of decompression and of accessing substrings of the encoded string without fully decompressing it, and explore various simplifications of L-systems that trade compression effectiveness by algorithmic efficiency to handle them.
Pages article 115259
Volume 1043
Journal name Theoretical Computer Science
Publisher Elsevier Science (Amsterdam, The Netherlands)
Reference URL View reference page