Deep Learning As an Aid to Text Mining in the Choice of Texts to Lemmatise for a Comparison Corpus: A Stylistic Study of Peter Damian’s Letters

Thon, Valérie; Vanni, Laurent; Longrée, Dominique

doi:10.1007/978-3-031-55917-4_14

No full text

Paper published in a book (Scientific congresses and symposiums)

Deep Learning As an Aid to Text Mining in the Choice of Texts to Lemmatise for a Comparison Corpus: A Stylistic Study of Peter Damian’s Letters

Thon, Valérie; Vanni, Laurent; Longrée, Dominique

2024 • In Giordano, Giuseppe (Ed.) New Frontiers in Textual Data Analysis

Peer reviewed

Permalink
https://hdl.handle.net/2268/324080

DOI
10.1007/978-3-031-55917-4_14

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Lemmatisation; Morphosyntax; Prediction; Distance calculation; Labelings; Learning models; Lemmatization; Linguistic patterns; Text-mining; Computer Science Applications; Information Systems; Information Systems and Management; Analysis

Abstract :

[en] Lemmatising and morphosyntactically labelling a Latin text is a time-consuming process. Focusing in this contribution on the epistolary corpus of Peter Damian (eleventh century), an ecclesiastical author of 180 Latin letters, we cross intertextual distance calculation (Brunet and Jaccard) and a deep learning model trained on authorship classification on a selection of unlemmatised texts from 39 of his literary predecessors; the idea is to theoretically identify which text(s) share a similar style to Peter, and would therefore be suitable candidates for a precise lemmatisation. A dialogue between both methods seems promising, and the areas of activation in the deep learning model even suggest a recognition of complex linguistic patterns that Peter possibly shares with some of his predecessors.

Disciplines :

Computer science
Languages & linguistics

Author, co-author :

Thon, Valérie ; Université de Liège - ULiège > Mondes anciens

Vanni, Laurent; CNRS, UCA, UMR7320 BCL, Nice, France

Longrée, Dominique ; Université de Liège - ULiège > Département des sciences de l'antiquité > Langue et littérature latines

Language :

English

Title :

Deep Learning As an Aid to Text Mining in the Choice of Texts to Lemmatise for a Comparison Corpus: A Stylistic Study of Peter Damian’s Letters

Publication date :

24 September 2024

Event name :

JADT 2022

Event place :

Naples, Ita

Event date :

06-07-2022 => 08-07-2022

Main work title :

New Frontiers in Textual Data Analysis

Editor :

Giordano, Giuseppe

Publisher :

Springer Science and Business Media Deutschland GmbH

ISBN/EAN :

978-3-03-155916-7

Pages :

173-184

Peer review/Selection committee :

Peer reviewed

Additional URL :

https://link.springer.com/content/pdf/10.1007/978-3-031-55917-4_14

Available on ORBi :

since 05 November 2024

Statistics

Number of views

101 (1 by ULiège)

Number of downloads

0 (0 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Bartsch, S., & Schiesaro, A. (Eds.). (2015). The Cambridge companion to Seneca. Cambridge University Press.
Brunet, É. (2003). Peut-on mesurer la distance entre deux textes ? Corpus, 2, 47–70.
Brunet, É., & Vanni, L. (2019). Deep learning et authentification des textes. Texto ! Textes et cultures, 24(1), 1–34.
Engels, L. J. (1988). Aspekte der Anwendung von Exempla bei Petrus Damiani. In W. J. Aerts & M. Gosman (Eds.), Exemplum et similitudo. Alexander the great and other heroes as points of reference in medieval literature(pp. 19–53). Egbert Forsten.
Henriet, P., & Polo de Beaulieu, M. A. (2023). Pierre Damien et les exempla. Stratégies d’auteur et réception. Civilisation médiévale, 52, Classiques Garnier.
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language (EMNLP), pp. 1746–1751, Doha, Qatar.
Lafon, P. (1980). Sur la variabilité de la fréquence des formes dans un corpus. Mots, 1, 127–165.
Leclercq, J. (1957). Saint Pierre Damien écrivain. Convivium, 25, 385–399.
Longrée, D., & Poudat, C. (2010). New ways of lemmatizing and tagging classical and postclassical Latin: The LATLEM project of the LASLA. In P. Anreiter & M. Kienpointner (Eds.), Proceedings of the 15th international colloquium on Latin linguistics (pp. 683–694).
Martini, P. S. (2002). L’inventario del secolo XII della biblioteca di Santa Croce di Fonte Avellana. In L. Gatto & P. S. Martini (Eds.), Studi sulle società e le culture del Medioevo per Girolamo Arnaldi (pp. 629–641). All’Insegna del Giglio.
Reindel, K. (1983–1993). Die Briefe des Petrus Damiani. Teil 1–4. Die Briefe der deutschen Kaiserzeit. Monumenta Germaniae Historica.
Thon, V., Vanni, L., & Longrée, D. (2022). Le deep learning auxiliaire de l’ADT dans le choix de textes à étiqueter en vue d’un corpus de comparaison: à propos de l’étude stylistique des lettres de Pierre Damien. In JADT 2022—Proceedings of the 16th international conference on statistical analysis of textual data, 2 (pp. 834–841).
Thon, V., Vanni, L., & Longrée, D. (2023). To what extent are lemmatisation and annotation relevant for deep learning assignments and textual motifs detection? The case-study of Peter Damian’s letters (11th century). La memoria digitale.. Atti del XII convegno annuale AIUCD (Università di Siena, 5-7 giugno 2023), 254-259.
Vanni, L., Mayaffre, D., & Longrée, D. (2018a). ADT et deep learning, regards croisés. In Phrasesclefs, motifs et nouveaux observables. JADT 2018—Proceedings of the 14th international conference on statistical analysis of textual data (pp. 459–466).
Vanni, L., Ducoffre, M., Mayaffre, D., Precioso, F., Longrée, D., Elango, V., et al. (2018b). Text Deconvolution Saliency (TDS): a deep tool box for linguistic analysis. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 548–557.
Verkerk, P., Ouvrard, P., Fantoli, M., & Longrée, D. (2020). L.A.S.L.A. and Collatinus: A convergence in lexica. In L. Tesconi (Ed.), Studi e saggi linguistici 2020, 1 (pp. 1–26). Edizioni ETS.