Abstract :
[en] This paper aims to contribute to Distributional Typology, whose explicit aim is to investigate linguistic diversity directly (“what’s where why?”, Bickel 2007), by investigating the typology of (co-)lexicalization patterns using a bottom-up approach to semantic maps. Specifically, we propose a new method for constructing semantic maps on the basis of massive cross-linguistic data, in order to evaluate the effects of (i) inheritance, (ii) language contact, and (iii) other environmental and cultural factors on patterns of polysemy and co-lexicalization. This method allows a fine-grained analysis of the factors that lead to the effects identified by areal lexico-semantics (Koptjevskaja-Tamm & Liljegren, 2017).
The semantic map model was initially created in order to describe the polysemy patterns of grammatical morphemes (see Cysouw, Haspelmath, & Malchukov, 2010 for an overview). Although studies using the model cover a wide range of linguistic phenomena, the majority pertained to the domain of grammar (e.g., Haspelmath, 1997; van der Auwera & Plungian, 1998). However, recent studies by François (2008), Perrin (2010), Wälchli and Cysouw (2012), Rakhilina and Reznikova (2016), Youn et al. (2016) and Georgakopoulos et al. (2016) have shown that the model can fruitfully be extended to lexical items. The common denominator in both lines of research is that the semantic maps were usually plotted manually, which, is particularly problematic for large-scale typological studies.
In this paper, we show that existing synchronic polysemy data in large language samples, such as ASJP (Wichmann et al., 2016), CLICS (List et al., 2014), and the Open Multilingual Wordnet (Bond & Paik, 2012) can be turned into lexical matrices using Python scripts. From these lexical matrices, one can infer large-scale weighted classical lexical semantic maps, using an adapted version of the algorithm introduced by Regier, Khetarpal, and Majid (2013). This approach is innovative in several respects. First, lexical semantic maps are automatically plotted and inferred directly from a significant amount of cross-linguistic data (cf. Youn et al., 2016). Second, unlike other types of polysemy networks in the field, these maps are structured – respecting the connectivity hypothesis (Croft, 2001) and what we call the ‘economy principle’. As such, they generate more interesting implicational universals and can be falsified based on additional empirical evidence. Finally, weighted lexical semantic maps allow exploring the frequency of polysemy patterns and shared lexicalizations from both a semasiological and an onomasiological perspective, which is hardly achievable with other methods.
We apply this method to a case study of verbs of perception and cognition (see Appendix for a provisional semantic map) and we enrich the result with additional cross-linguistic data (Zalziniak et al., 2012). The semantic map method allows one to visualize a structured cross-linguistic polysemy network, and to systematically analyze the types of mapping of lexical items onto this network. More specifically, the method allows one to differentiate between common polysemy patterns attested in unrelated languages and shared polysemy patterns, that is colexification patterns shared among languages in the same area. These results will be compared to (i) geographical and genetic data in order to determine the interaction between lexicalization patterns and areality, on the one hand, and common inheritance, on the other. Our findings will also be compared to (ii) proposed universal generalizations, in order to evaluate their validity and limits, and to (iii) proposed language/culture-specific associations identified in the literature (e.g., Viberg, 1984; Sweetser, 1990; Evans & Wilkins, 2000; Aikhenvald & Storch, 2013), in order to evaluate the degree to which the bottom-up method relying on large language samples matches the results of case-studies conducted by experts.