[en] The notion of string attractor has been introduced by Kempa and Prezza in 2018 in the context of data compression. It represents a set of positions of a finite word in which all of its factors can be “attracted”, i.e., each factor occurs crossing a position from the set. The smallest size γ∗ of a string attractor for a finite word is a lower bound for several repetitiveness measures associated with the most common compression schemes, including BWT-based and LZ-based compressors. The combinatorial properties of the measure γ∗ have been studied in 2021 by Mantaci et al. Very recently, a complexity function, called the string attractor profile function, has been introduced for infinite words, by evaluating γ∗ on each prefix. Such a complexity function has been studied for automatic sequences and linearly recurrent infinite words by Schaeffer and Shallit.
Our contribution to the topic is threefold. First, we explore the relationship between the string attractor profile function and other well-known combinatorial notions related to repetitiveness in the context of infinite words, such as the factor complexity and the property of recurrence. Moreover, we study its asymptotic growth in the case of purely morphic words and obtain a complete description in the binary case. Second, we study similar properties for two new string attractor-based complexity functions, in which the structure and the distribution of positions in a string attractor are taken into account. We also show that these measures provide a finer classification of some infinite families of words, namely the Sturmian and quasi-Sturmian words. Third, we explicitly give the three complexities for some specific morphic words called k-bonacci words.
Disciplines :
Mathématiques
Auteur, co-auteur :
Cassaigne, Julien
Gheeraert, France ; Université de Liège - ULiège > Département de mathématique > Mathématiques discrètes
Restivo, Antonio
Romana, Giuseppe
Sciortino, Marinella
Stipulanti, Manon ; Université de Liège - ULiège > Département de mathématique > Mathématiques discrètes
Langue du document :
Anglais
Titre :
New string attractor-based complexities for infinite words
Ambrož, P., Masáková, Z., Pelantová, E., Frougny, C., Palindromic complexity of infinite words associated with simple Parry numbers. Ann. Inst. Fourier, 2006, 2131–2160.
Béal, M.P., Perrin, D., Restivo, A., Decidable problems in substitution shifts. J. Comput. Syst. Sci., 143, 2024, 103529, 10.1016/j.jcss.2024.103529.
Becher, V., Heiber, P.A., On extending de Bruijn sequences. Inf. Process. Lett. 111 (2011), 930–932, 10.1016/j.ipl.2011.06.013.
Bulgakova, D., Frid, A., Scanvic, J., Prefix palindromic length of the Sierpinski word. Developments in Language Theory Lecture Notes in Comput. Sci., vol. 13257, 2022, Springer, Cham, 78–89, 10.1007/978-3-031-05578-2_6.
Cassaigne, J., Complexité et facteurs spéciaux. Bull. Belg. Math. Soc. Simon Stevin 4 (1997), 67–88.
Cassaigne, J., Sequences with grouped factors. Developments in Language Theory, Aristotle University of Thessaloniki, 1997, 211–222.
Cassaigne, J., Recurrence in infinite words. STACS, 2001, Springer, 1–11.
Cassaigne, J., Nicolas, F., Factor complexity. Berthé, V., Rigo, M., (eds.) Combinatorics, Automata and Number Theory, vol. 135, 2010, Cambridge University Press, 163–247.
Castiglione, G., Restivo, A., Sciortino, M., Hopcroft's algorithm and cyclic automata. LATA, 2008, Springer, 172–183.
Castiglione, G., Restivo, A., Sciortino, M., Circular Sturmian words and Hopcroft's algorithm. Theor. Comput. Sci. 410 (2009), 4372–4381.
Constantinescu, S., Ilie, L., The Lempel–Ziv complexity of fixed points of morphisms. SIAM J. Discrete Math. 21 (2007), 466–481.
Dolce, F., String attractors for factors of the Thue-Morse word. WORDS, 2023, Springer, 117–129.
Droubay, X., Justin, J., Pirillo, G., Episturmian words and some constructions of de Luca and Rauzy. Theor. Comput. Sci. 255 (2001), 539–553.
Durand, F., Linearly recurrent subshifts have a finite number of non-periodic subshift factors. Ergod. Theory Dyn. Syst. 20 (2000), 1061–1078.
Durand, F., Corrigendum and addendum to: “Linearly recurrent subshifts have a finite number of non-periodic subshift factors” [Ergodic Theory Dynam. Systems 20 (2000), no. 4, 1061–1078, MR1779393 (2001m:37022)]. Ergod. Theory Dyn. Syst. 23 (2003), 663–669, 10.1017/S0143385702001293.
Durand, F., Perrin, D., Dimension Groups and Dynamical Systems: Substitutions, Bratteli Diagrams and Cantor Systems. Cambridge Studies in Advanced Mathematics, 2022, Cambridge University Press.
Frosini, A., Mancini, I., Rinaldi, S., Romana, G., Sciortino, M., Logarithmic equal-letter runs for BWT of purely morphic words. DLT, 2022, Springer, 139–151.
Gheeraert, F., Romana, G., Stipulanti, M., String attractors of fixed points of k-bonacci-like morphisms. WORDS, 2023, Springer, 192–205.
Glen, A., On Sturmian and episturmian words, and related topics. Ph.D. thesis, 2006, University of Adelaide, Australia.
Heinis, A., Languages under substitutions and balanced words. J. Théor. Nr. Bordx. 16 (2004), 151–172.
Holub, Š., Words with unbounded periodicity complexity. Int. J. Algebra Comput. 24 (2014), 827–836.
Kempa, D., Prezza, N., At the roots of dictionary compression: string attractors. STOC, ACM, 2018, 827–840.
Knuth, D.E., Jr., J.H.M., Pratt, V.R., Fast pattern matching in strings. SIAM J. Comput. 6 (1977), 323–350.
Kociumaka, T., Navarro, G., Prezza, N., Toward a definitive compressibility measure for repetitive sequences. IEEE Trans. Inf. Theory 69 (2023), 2074–2092.
Kosolobov, D., Shur, A.M., Comparison of LZ77-type parsings. Inf. Process. Lett. 141 (2019), 25–29.
Kutsukake, K., Matsumoto, T., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M., On repetitiveness measures of Thue-Morse words. SPIRE, 2020, Springer, 213–220.
Lempel, A., Ziv, J., On the complexity of finite sequences. IEEE Trans. Inf. Theory 22 (1976), 75–81.
Lothaire, M., Algebraic Combinatorics on Words, vol. 90. 2002, Cambridge University Press.
de Luca, A., Mignosi, F., Some combinatorial properties of Sturmian words. Theor. Comput. Sci. 136 (1994), 361–385.
Mantaci, S., Restivo, A., Romana, G., Rosone, G., Sciortino, M., A combinatorial view on string attractors. Theor. Comput. Sci. 850 (2021), 236–248.
Mantaci, S., Restivo, A., Sciortino, M., Burrows-Wheeler transform and Sturmian words. Inf. Process. Lett. 86 (2003), 241–246.
Morse, M., Hedlund, G.A., Symbolic dynamics. Am. J. Math. 60 (1938), 815–866.
Mousavi, H., Automatic theorem proving in Walnut. https://doi.org/10.48550/arXiv.1603.06017, 2016.
Navarro, G., The compression power of the BWT: technical perspective. Commun. ACM, 65, 2022, 90.