Using geographic regression to analyse linguistic diversity: The interplay of language-internal and -external factors
2022-04-12, 15:30–16:00 (Europe/Vienna), Room 1

The languages spoken across the world show remarkable diversity. Explaining patterns of language variation is one of the challenges in linguistics, as the drivers of these patterns and the ways in which they interact are not yet fully understood. Linguists have long been aware of the importance of language-external factors such as geographic, environmental, and socio-demographic aspects. Using insights from quantitative ecology, the field currently relies on data coded as distance matrices to investigate diversity (see Zuur et al. 2009 for an overview). However, language-internal factors, such as characteristics related to meaning, have been also been shown to influence diversity (AUTHORS 2019a). Because such factors cannot be coded into distance matrices, a new methodology is needed if we want to simultaneously investigate external and internal drivers of language variation.
In a study on linguistic diversity in Japan, we introduced an innovative way of linear mixed modelling that takes pairwise comparisons as individual data points (AUTHORS 2019b). This makes it possible to include different types of additional variables for each comparison. The use of linear mixed-effects modelling provides a more comprehensive analysis tool that is superior to techniques currently used in dialectometry and can help us better understand patterns of language variation.
Using this novel method, this paper examines the influence of both language-external and language-internal drivers of linguistic diversity in the Limburgish dialect area in the Netherlands and Belgium. The data comes from the Dictionary of the Limburgish Dialects, whose scale and level of systematicity provide an ideal basis to study general processes of linguistic variation and change. We specifically focus on data from five semantic fields: (1) ‘the human body’; (2) ‘personality & feelings’; (3) ‘church & religion’; (4) ‘society & education’; and (5) ‘clothing & personal hygiene’. These semantic fields comprise different levels of variation. Where many of the concepts included in ‘the human body’ and ‘personality & feelings’ are considered basic vocabulary—and should thus show less variation—the other fields largely consist of culturally variable vocabulary—which should show more variation. In addition, the three culturally variable fields differ in their degree of standardisation: the ‘church & religion’ field is highly standardised; the ‘society & education’ is standardised within each country with differences between Belgium and the Netherlands; and the ‘clothing & personal hygiene’ is the least standardised overall (cf. AUTHORS 2019c).
The first step of the analysis calculates pairwise linguistic distances between all locations in the dialect area. Next, more traditional techniques from ecology—Mantel correlation and Multiple Regression over distance matrices (MRM)—are used to analyse the relationship between linguistic distance and language-external factors: (1) geographic distance; (2) separation by a large body of water (the Meuse river); (3) separation by the national border between Belgium and the Netherlands; and (4) differences in population size. These analyses are conducted for each semantic field separately, to examine whether the influence of these factors differs between fields. Finally, linear mixed-effects modelling is used to additionally examine the role of semantic factors on linguistic diversity.


AUTHORS 2019a,b,c
Zuur AF, Ieno EN, Walker N, Saveliev AA & Smith GM. (2009). Mixed Effects Models and extensions in Ecology. New York: Springer.