No document available.
Abstract :
[en] Despite the explosion of diachronic corpora of English in the last few decades, still not a single corpus exists that covers the entire documented history of English. Although its compilation is generally perceived as most attractive (Rissanen 2000: 13), corpus compilers do not seem to believe in its creation in the near future. This is regrettable, as many linguists dealing with longitudinal developments such as grammaticalization need to cover very long time spans, and are forced to combine several, not necessarily compatible, corpora (e.g. Hilpert 2008, van Linden 2009). Clearly, their results are less reliable than they might be if a single corpus existed (for example, Gries and Hilpert’s data (2008) show a major shift in the collocational profile of shall about 1710; however, this is precisely where one corpus they use ends and a second – rather different one – begins).
So I tentatively started compiling a corpus myself, provisionally called LEON (Leuven English Old to New). The basic architecture of LEON comprises a 400,000 word corpus for each HC-period, and after 1710 for the periods 1710-1780, 1780-1850, 1850-1920, 1920-1990 and post-1990. Data available from 1250-1350, a less well represented period, serve as a template on which other subperiods are to be based to acquire best comparability of genre and region. To make up for the lack of some genres (letters, diaries) and social stratification, for each period after 1350 an additional, selfsufficient 600,000 words corpus is envisaged.
While LEON is primarily conceived as a ‘meta-corpus’, mining existing corpora, some additions are envisaged too (e.g. the unedited Statutes Rwl. B.520, dated a1325). LEON does not aim at full comparability (which would be presumptuous), but wants to optimize the usefulness of concepts like ‘equal size of subperiods’ or ‘diachronic text prototype’ (HC). LEON might be, as compared to the present ‘big evil’, a ‘lesser evil’.
References
Gries, Stefan Th. and Martin Hilpert. The identification of stages in diachronic data: variability-based neighbour clustering. Corpora Vol. 3 (1): 59–81.
Hilpert, Martin. 2008. Germanic future constructions A usage-based approach to language change. Amsterdam & Philadelphia: John Benjamins.
Los, Bettelou. 2005. The rise of the to-infinitive. Oxford: Oxford University Press.
Rissanen, Matti & Merja Kytö. 1993. General introduction. In Rissanen, Matti, Merja Kytö & Minna Palander-Collin, eds. 1993. Early English in the computer age: Explorations through the Helsinki Corpus. Berlin: Mouton de Gruyter. 1-17.
Rissanen, Matti. 2000. The world of English historical corpora: From Cædmon to computer age. Journal of English Linguistics 28: 7-20.
van Linden, An. 2009. Dynamic, deontic and evaluative adjectives and their clausal complement patterns: A synchronic-diachronic account. PhD dissertation, University of Leuven.