token-based distributional vectors; word-embeddings; alternation; corpora; language variation
Abstract :
[en] Researchers in usage-based construction grammar are increasingly using distributional vector
modelling, e.g. to study changes in the productivity of syntactic constructions (Perek 2016) or lexical
biases in the choice between syntactic alternants (Pijpops et al. 2018). This technique models the
meaning of a word or a construction by representing the textual contexts of its occurrences in a
representative corpus as a mathematical vector (see Lenci 2018 for an accessible introduction, as well
as further references). Until now, this research has primarily used type-based vectors, whereby each
vector represents the entire polysemy of a lemma or a constructional variant. We aim to add to this
research by employing token-based vectors (Heylen et al. 2015; De Pascale & Zhang 2021). These
vectors overcome the problematic conflation of senses in type-based vectors, by representing the
meaning of individual occurrences of a word or a construction, as the weighted average of the typebased vectors of the context words of that occurrence.
The present study uses token-based vectors to investigate the choice between two seemingly
interchangeable constructions (cf. Gries and Stefanowitsch 2004; Dosedlová and Lu 2019). As a case
study, we look at the alternation between the Dutch transitive construction and the so-called naarconstruction, as in (1)-(2). We calculate token-based vectors for all occurrences of a verb in either
construction and plot these vectors using Multidimensional Scaling. This allows us the delineate the
semantic space where both constructions overlap, as well where they diverge.
(1) Ik verlang echt (naar) betere resultaten.
I desire really (to) better results
‘I really desire the better results.’
(2) De man greep (naar) een mes en stak een van
The man grabbed (to) a knife and stabbed one of
zijn kameraden twee keer.
his comrades two times
‘The man grabbed a knife and stabbed one of his comrades twice.’
We first focus on the variation of the verb verlangen ‘desire’, which has already been taken under
scrutiny in Pijpops et al. (2021). This allows us to confirm the validity of our technique. Next, we turn
to the variation of the verb grijpen ‘grab’, which has so far not been studied in any real depth. Our
results indicate that token-based distributional vectors can be a useful addition to the methodological
toolbox of researchers in usage-based construction grammar.
Research Center/Unit :
Lilith - Liège, Literature, Linguistics - ULiège
Disciplines :
Languages & linguistics
Author, co-author :
De Pascale, Stefano
Pijpops, Dirk ; Université de Liège - ULiège > Département de langues modernes : linguistique, littérature et traduction ; Université de Liège - ULiège > Lilith - Liège, Literature, Linguistics
Language :
English
Title :
Modelling meaning differences in syntactic alternations with token-based vectors