Barbiers, S., Bennis, H., De Vogelaer, G., Devos, M. and van der Ham, M. (2005). Syntactische Atlas van de Nederlandse Dialecten Volume 1.
Barbiers, S., van der Auwera, J., Bennis, H., Boef, E., De Vogelaer, G. and van der Ham, M. (2008). Syntactische Atlas van de Nederlandse Dialecten Volume 2.
Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL, 30, 31-40.
Goebl, H. (2006). Recent advances in Salzburg dialectometry. Literary and linguistic computing, 21(4), 411-435.
Grieve, J. (2018). Spatial statistics for dialectology. In Boberg, C., Nerbonne, J. & Watt, D. (eds.) The handbook of dialectology, 415-433.
Heeringa, W., & Nerbonne, J. (2001). Dialect areas and dialect continua. Language variation and change, 13(3), 375-400.
Nerbonne, J., & Wieling, M. (2018). Statistics for aggregate variationist analyses. In Boberg, C., Nerbonne, J. & Watt, D. (eds.) The Handbook of Dialectology, 400-414.
Ord, J. K., & Getis, A. (1995). Local spatial autocorrelation statistics: distributional issues and an application. Geographical analysis, 27(4), 286-306.
Pickl, S. (2016). Fuzzy dialect areas and prototype theory: Discovering latent patterns in geolinguistic variation. In Cote, M.-H., Knooihuizen, R. & Nerbonne, H. (eds.) The future of dialects, 75-98.
Prokic, J., & Nerbonne, J. (2008). Recognising groups among dialects. International journal of humanities and arts computing, 2(1-2), 153-172.
Prokic, J., Çöltekin, Ç., & Nerbonne, J. (2012). Detecting shibboleths. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH (pp. 72-80).
Sung, H. W. M. & Prokic, J. (2024). A Comparison between 3 Feature Extraction Methods in Dialectometry. Computational Linguistics in the Netherlands Journal, Vol. 13.
Wieling, M., & Nerbonne, J. (2015). Advances in dialectometry. Annual Review Linguistics, 1(1), 243-264.
One of the primary interests in dialectometry, a quantitative approach to dialectology, is the classification of dialects in a given area (Goebl 2006, Wieling and Nerbonne 2015). By using computational and statistical approaches, it is possible to aggregate a large number of dialect features in order to understand the structure of linguistic variation and the relationship between the dialects.
One of the tools which dialectometrists use to seek partitions of dialect groups is cluster analysis (Prokic and Nerbonne 2008, Nerbonne and Wieling 2018), based on the dialect distances calculated between all the pairs of dialects in the data. An identified cluster implies a group of dialects which are relatively similar to each other, and rather distant from dialects outside the group (Nerbonne and Wieling 2018: 401). The result from the analysis resembles the traditional notion of a dialect area (Heeringa and Nerbonne 2001).
Cluster analysis has been criticised for not being able to identify linguistic features found in each cluster, nor being able to identify “exclusively dominant areas, subordinate, non-dominants areas that are determined by smaller numbers of features” (Pickl 2016: 81). In other words, cluster analysis does not indicate which dialects are more ‘typical’ in the cluster, nor does it show the gradual change from one ‘core’ area to another.
Pickl’s (2016) first criticism is not only found in cluster analysis, but it also applies to the aggregation step in dialectometry, since dialect relationships are reduced to a numeric (distance) value (Sung and Prokic 2024). To identify characteristic features for individual dialect groups, Sung and Prokic (2024) have proposed using Normalised Pointwise Mutual Information (nPMI) (Bouma, 2009). Unlike previous proposed methods (e.g. Factor Analysis (Pickl 2016) and a method similar to Fisher’s Linear Discriminant (Prokic et al. 2012)), nPMI seeks the most exclusive features by simultaneously looking at the probability of a feature within a set of features (like a word position) and the probability of being in a specific cluster.
To address the second criticism that Pickl (2016) pointed out, we make use of the features extracted using nPMI. To find the ‘focal’ area within a cluster, we calculate the degree of dialect typicality from the typical features of the cluster (extracted with nPMI) in each locality. We then further identify the concentration of dialects with high dialect typicality using local spatial autocorrelation (Getis-Ord Gi*, Ord and Getis 1995; Grieve 2018). The z-scores from the local spatial autocorrelation analysis are represented on a hot spot map, where high-value clusters are found in the ‘hot spot’ and low-value clusters in the ‘cold spot’. The ‘hot spot’ represents the kernel of the dialect area, while areas with a decreased z-score represent transitional dialects. This approach allows the researchers to enhance cluster analysis by a) identifying typical features for each cluster and additionally b) identify focal and transitional dialect areas. In this talk, we will demonstrate the above approach with the dialectometric analysis on the Syntactic Atlas of Dutch Dialects (Vol. 1 and 2, Barbiers et al. 2005, Barbiers et al. 2008).