New string attractor-based complexities for infinite words

String attractor; factor complexity; recurrence function; appearance function; repetitiveness measure; span complexity; leftmost complexity; morphic word; Sturmian word; quasi-Sturmian word; k-bonacci word; k-bonacci number

Abstract :

[en] The notion of string attractor has been introduced by Kempa and Prezza in 2018 in the context of data compression. It represents a set of positions of a finite word in which all of its factors can be “attracted”, i.e., each factor occurs crossing a position from the set. The smallest size γ∗ of a string attractor for a finite word is a lower bound for several repetitiveness measures associated with the most common compression schemes, including BWT-based and LZ-based compressors. The combinatorial properties of the measure γ∗ have been studied in 2021 by Mantaci et al. Very recently, a complexity function, called the string attractor profile function, has been introduced for infinite words, by evaluating γ∗ on each prefix. Such a complexity function has been studied for automatic sequences and linearly recurrent infinite words by Schaeffer and Shallit. Our contribution to the topic is threefold. First, we explore the relationship between the string attractor profile function and other well-known combinatorial notions related to repetitiveness in the context of infinite words, such as the factor complexity and the property of recurrence. Moreover, we study its asymptotic growth in the case of purely morphic words and obtain a complete description in the binary case. Second, we study similar properties for two new string attractor-based complexity functions, in which the structure and the distribution of positions in a string attractor are taken into account. We also show that these measures provide a finer classification of some infinite families of words, namely the Sturmian and quasi-Sturmian words. Third, we explicitly give the three complexities for some specific morphic words called k-bonacci words.

Disciplines :

Mathematics

Author, co-author :

Cassaigne, Julien

Gheeraert, France ; Université de Liège - ULiège > Département de mathématique > Mathématiques discrètes

Restivo, Antonio

Romana, Giuseppe

Sciortino, Marinella

Stipulanti, Manon ; Université de Liège - ULiège > Département de mathématique > Mathématiques discrètes

Language :

English

Title :

New string attractor-based complexities for infinite words

Publication date :

2024

Journal title :

Journal of Combinatorial Theory. Series A

ISSN :

0097-3165

Publisher :

Elsevier, Atlanta, Georgia

Volume :

208

Pages :

105936

Peer reviewed :

Peer Reviewed verified by ORBi

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 11 December 2023

Statistics

Number of views

164 (18 by ULiège)

Number of downloads

231 (8 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Allouche, J.P., Shallit, J.O., Automatic Sequences: Theory, Applications, Generalizations. 2003, Cambridge University Press.
Ambrož, P., Masáková, Z., Pelantová, E., Frougny, C., Palindromic complexity of infinite words associated with simple Parry numbers. Ann. Inst. Fourier, 2006, 2131–2160.
Béal, M.P., Perrin, D., Restivo, A., Decidable problems in substitution shifts. J. Comput. Syst. Sci., 143, 2024, 103529, 10.1016/j.jcss.2024.103529.
Becher, V., Heiber, P.A., On extending de Bruijn sequences. Inf. Process. Lett. 111 (2011), 930–932, 10.1016/j.ipl.2011.06.013.
Bulgakova, D., Frid, A., Scanvic, J., Prefix palindromic length of the Sierpinski word. Developments in Language Theory Lecture Notes in Comput. Sci., vol. 13257, 2022, Springer, Cham, 78–89, 10.1007/978-3-031-05578-2_6.
Cassaigne, J., Complexité et facteurs spéciaux. Bull. Belg. Math. Soc. Simon Stevin 4 (1997), 67–88.
Cassaigne, J., Sequences with grouped factors. Developments in Language Theory, Aristotle University of Thessaloniki, 1997, 211–222.
Cassaigne, J., Recurrence in infinite words. STACS, 2001, Springer, 1–11.
Cassaigne, J., Nicolas, F., Factor complexity. Berthé, V., Rigo, M., (eds.) Combinatorics, Automata and Number Theory, vol. 135, 2010, Cambridge University Press, 163–247.
Castiglione, G., Restivo, A., Sciortino, M., Hopcroft's algorithm and cyclic automata. LATA, 2008, Springer, 172–183.
Castiglione, G., Restivo, A., Sciortino, M., Circular Sturmian words and Hopcroft's algorithm. Theor. Comput. Sci. 410 (2009), 4372–4381.
Constantinescu, S., Ilie, L., The Lempel–Ziv complexity of fixed points of morphisms. SIAM J. Discrete Math. 21 (2007), 466–481.
Dolce, F., String attractors for factors of the Thue-Morse word. WORDS, 2023, Springer, 117–129.
Droubay, X., Justin, J., Pirillo, G., Episturmian words and some constructions of de Luca and Rauzy. Theor. Comput. Sci. 255 (2001), 539–553.
Durand, F., Linearly recurrent subshifts have a finite number of non-periodic subshift factors. Ergod. Theory Dyn. Syst. 20 (2000), 1061–1078.
Durand, F., Corrigendum and addendum to: “Linearly recurrent subshifts have a finite number of non-periodic subshift factors” [Ergodic Theory Dynam. Systems 20 (2000), no. 4, 1061–1078, MR1779393 (2001m:37022)]. Ergod. Theory Dyn. Syst. 23 (2003), 663–669, 10.1017/S0143385702001293.
Durand, F., Host, B., Skau, C., Substitutional dynamical systems, Bratteli diagrams and dimension groups. Ergod. Theory Dyn. Syst. 19 (1999), 953–993.
Durand, F., Perrin, D., Dimension Groups and Dynamical Systems: Substitutions, Bratteli Diagrams and Cantor Systems. Cambridge Studies in Advanced Mathematics, 2022, Cambridge University Press.
Dvořáková, Ľ., String attractors of episturmian sequences. Theor. Comput. Sci., 986, 2024, 114341, 10.1016/j.tcs.2023.114341.
Dvořáková, Ľ., Hendrychová, V., String attractors of Rote sequences. https://doi.org/10.48550/arXiv.2308.00850, 2023.
Frosini, A., Mancini, I., Rinaldi, S., Romana, G., Sciortino, M., Logarithmic equal-letter runs for BWT of purely morphic words. DLT, 2022, Springer, 139–151.
Gheeraert, F., Romana, G., Stipulanti, M., String attractors of fixed points of k-bonacci-like morphisms. WORDS, 2023, Springer, 192–205.
Glen, A., On Sturmian and episturmian words, and related topics. Ph.D. thesis, 2006, University of Adelaide, Australia.
Heinis, A., Languages under substitutions and balanced words. J. Théor. Nr. Bordx. 16 (2004), 151–172.
Holub, Š., Words with unbounded periodicity complexity. Int. J. Algebra Comput. 24 (2014), 827–836.
Kempa, D., Prezza, N., At the roots of dictionary compression: string attractors. STOC, ACM, 2018, 827–840.
Knuth, D.E., Jr., J.H.M., Pratt, V.R., Fast pattern matching in strings. SIAM J. Comput. 6 (1977), 323–350.
Kociumaka, T., Navarro, G., Prezza, N., Toward a definitive compressibility measure for repetitive sequences. IEEE Trans. Inf. Theory 69 (2023), 2074–2092.
Kosolobov, D., Shur, A.M., Comparison of LZ77-type parsings. Inf. Process. Lett. 141 (2019), 25–29.
Kutsukake, K., Matsumoto, T., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M., On repetitiveness measures of Thue-Morse words. SPIRE, 2020, Springer, 213–220.
Lempel, A., Ziv, J., On the complexity of finite sequences. IEEE Trans. Inf. Theory 22 (1976), 75–81.
Lothaire, M., Algebraic Combinatorics on Words, vol. 90. 2002, Cambridge University Press.
de Luca, A., Mignosi, F., Some combinatorial properties of Sturmian words. Theor. Comput. Sci. 136 (1994), 361–385.
Mantaci, S., Restivo, A., Romana, G., Rosone, G., Sciortino, M., A combinatorial view on string attractors. Theor. Comput. Sci. 850 (2021), 236–248.
Mantaci, S., Restivo, A., Sciortino, M., Burrows-Wheeler transform and Sturmian words. Inf. Process. Lett. 86 (2003), 241–246.
Morse, M., Hedlund, G.A., Symbolic dynamics. Am. J. Math. 60 (1938), 815–866.
Mousavi, H., Automatic theorem proving in Walnut. https://doi.org/10.48550/arXiv.1603.06017, 2016.
Navarro, G., The compression power of the BWT: technical perspective. Commun. ACM, 65, 2022, 90.
Navarro, G., Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54 (2022), 29:1–29:31.
Navarro, G., Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. 54 (2022), 26:1–26:32.
Pansiot, J.J., Complexité des facteurs des mots infinis engendrés par morphimes itérés. ICALP, 1984, Springer, 380–389.
Pansiot, J.J., Decidability of periodicity for infinite words. RAIRO Theor. Inform. Appl. 20 (1986), 43–46.
Rauzy, G., Mots infinis en arithmetique. Nivat, M., Perrin, D., (eds.) Automata on Infinite Word, vol. 192, 1985, Springer Berlin Heidelberg, 164–171.
Restivo, A., Romana, G., Sciortino, M., String attractors and infinite words. LATIN, 2022, Springer, 426–442.
Schaeffer, L., Shallit, J.O., String attractors for automatic sequences. https://doi.org/10.48550/arXiv.2012.06840, 2021.
Sciortino, M., Zamboni, L.Q., Suffix automata and standard Sturmian words. Developments in Language Theory, 2007, Springer, 382–398.
Sloane, N.J.A., The on-line encyclopedia of integer sequences. http://oeis.org, 1964.