data-sharing; corpus linguistics; linguistic fieldwork; language archive; oral literature
Abstract :
[en] This paper discusses current practices of data-sharing within linguistics and argues that different types of data are shared for different reasons. A first type of data includes answers to a questionnaire on binominal lexemes supplied to a colleague carrying out a cross-linguistic investigation of that topic. I provided him with data on Harakmbut, a language isolate spoken in the Peruvian Amazon, and later used these data myself in two publications. One of outlets required the publication of the dataset underlying the article so as to ensure replicability and adequate evaluation of the article. A second type of data concerns corpus examples and their analysis in terms of several analytical parameters. The data come the Germanic languages Dutch and English, and were also shared within the context of the publication of journal articles (currently under review). The repositories used are Zenodo and the ULiège Dataverse. A third type of data originates in my language documentation and description work on Harakmbut, which is critically endangered. The data mainly comprise sound recordings and annotation files including the segmentation, transcription and analysis of the recordings collected in the field. I am currently in the process of archiving these data with the California Language Archive, to help preserve the cultural heritage of the Harakmbut community, and to make sure other scholars (e.g. linguists, anthropologists, biologists, archaeologists) can reuse my work in the future.
Research Center/Unit :
Lilith - Liège, Literature, Linguistics - ULiège
Disciplines :
Languages & linguistics
Author, co-author :
Van Linden, An ; Université de Liège - ULiège > Département de langues modernes : linguistique, littérature et traduction > Linguistique synchronique anglaise