Paper published in a book (Scientific congresses and symposiums)
Automated text categorization in a dead language. The detection of genres in Late Egyptian
Gohy, Stéphanie; Martin Leon, Benjamin; Polis, Stéphane
2013In Polis, Stéphane; Winand, Jean (Eds.) Texts, Languages & Information Technology in Egyptology. Selected papers from the meeting of the Computer Working Group of the International Association of Egyptologists (Informatique & Égyptologie), Liège, 6-8 July 2010
Peer reviewed
 

Files


Full Text
AegLeod9_04_Ramses3.pdf
Publisher postprint (2.99 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Abstract :
[en] This paper is a first step in applying machine learning methods typical of Automated Text Catego-rization (ATC) for Automatic Genre Identification (AGI) in Late Egyptian, a language written in either hieroglyphic or hieratic scripts that is found in documents from Ancient Egypt dating from ca. 1350-700 BCE. The study is divided into three parts. After a general intro¬duction on AGI (§1), we introduce the levels of annotation that are integrated in the Ramses corpus and can be used when performing AGI on Late Egyptian (§2). In the following section (§3) we offer a brief survey of the types of features that have been discussed in the literature on AGI, before proceeding with three case studies where we apply supervised machine learning methods — namely the naïve Bayes classifier (§4.1), the Support Vector Machine (§4.2), and the Segment and Combine approach (§4.3) — to a selection of texts in the corpus. Their respective performances are tested using lexical, part-of-speech and inflectional features.
Disciplines :
Computer science
Languages & linguistics
Classical & oriental studies
Author, co-author :
Gohy, Stéphanie ;  Université de Liège - ULiège > Département des sciences de l'antiquité > Egyptologie
Martin Leon, Benjamin ;  Université de Liège - ULiège > Département des sciences de l'antiquité > Egyptologie
Polis, Stéphane  ;  Université de Liège - ULiège > Département des sciences de l'antiquité > Egyptologie
Language :
English
Title :
Automated text categorization in a dead language. The detection of genres in Late Egyptian
Publication date :
2013
Event name :
Informatique & Égyptologie 2010. Texts, Languages & Information Technology in Egyptology
Event organizer :
Stéphane Polis & Jean Winand
Event place :
Liège, Belgium
Event date :
6-8 juillet 2010
By request :
Yes
Audience :
International
Main work title :
Texts, Languages & Information Technology in Egyptology. Selected papers from the meeting of the Computer Working Group of the International Association of Egyptologists (Informatique & Égyptologie), Liège, 6-8 July 2010
Editor :
Polis, Stéphane  ;  Université de Liège - ULiège > Mondes anciens
Winand, Jean  ;  Université de Liège - ULiège > Mondes anciens
Publisher :
Presses Universitaires de Liège, Liège, Belgium
Collection name :
Aegyptiaca Leodiensia 9
Pages :
61-74
Peer reviewed :
Peer reviewed
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]
Available on ORBi :
since 28 January 2012

Statistics


Number of views
327 (23 by ULiège)
Number of downloads
285 (3 by ULiège)

Bibliography


Similar publications



Contact ORBi