Coherence and lexical variation in Neostandard Italian: A sociolectrometric analysis
2022-04-14, 10:00–10:30 (Europe/Vienna), Room 2

The research reported in this paper is part of a larger endeavor to understand the underlying structure of language varieties. The past decades have witnessed the emergence of intermediate varieties between dialect and standard language (also known as neostandard varieties, e.g. tussentaal in Flanders, italiano neostandard in Italy, see Auer 2017). Many of them seem to present a much clearer uniformity than was initially assumed (Guy & Hinskens 2016; Grondelaers et al. 2011; Cerruti & Vietti 2019). However, their coherent patterns are generally investigated by means of the distribution of morpho-syntactic features. Very little is known about the role of lexical variation.

In this paper, we further dig into the theoretical question of coherence by investigating lexical variation, here defined as the use of different names for the same conceptual category (Geeraerts, 2017). This perspective on lexical variation allows us to operationalize near-synonymous lexical choice as a sociolinguistic variable in the Labovian sense, that is, as “a set of alternative ways of ‘saying the same thing’” (Labov, 1969). Furthermore, in order to tackle issues regarding the coherence of lexical choice, the variation is considered using the methods and notions from sociolectometry (Geeraerts et al., 1999), which most aptly approaches the structure of language varieties (i.e. lects) by studying aggregate-level lexical distances between them.

In our study, the focus will lie on lexical variation in Neostandard Italian. Although our main attention will go to spoken Italian, for the sake of the sociolectometric analysis we will compare our data with written Italian. The KIParla corpus (Mauri et al. 2019) will provide data for spoken academic language from Turin and Bologna. The corpus contains several types of interaction recorded in different university settings (e.g. conversations, lessons, interviews) and is stratified along gender, age, and region of origin. The strong regional stratification allows us to include a strongly controlled regional dimension in our analysis. The data for the written counterpart will be extracted from the academic section of the CORIS corpus (Rossini Favretti et al. 2002).

For the collection of many lexical sociolinguistic variables we will apply type-based distributional semantic methods, by means of a technique for synonymy retrieval developed in De Pascale (2019). In the first place adjectival and verbal (e.g. ABLE { abile, capace }, DISCUSS { dibattere, discutere }) variables will be studied, as they might represent the word classes that introduce less topical bias in our analyses (given that the topic ought to stay stable across the varieties under comparison).

The sociolectometric operationalization envisaged here allows to explore how language-internal and language-external dimensions shape lexical variation in Neostandard Italian, by quantifying these as predictors in a mixed-effects regression analysis, which is the state-of-the-art technique in sociolectometry (Wieling et a. 2014). Moreover, this type of regression allows to detect the relative influence of the individual variables, therefore ensuring a more granular view in the structure of Neostandard Italian beneath the aggregate-level measurements.

Panel affiliation

Sociolinguistic variation in contemporary Italian


Auer, P. 2011. Dialect vs. standard. A typology of scenarios in Europe. In B. Kortmann & J. Van der Auwera (eds.), The languages and linguistics of Europe. A comprehensive guide. Berlin & Boston: De Gruyter Mouton.
Cerruti M. & A. Vietti. 2019. Co-occurrence of neo-standard features in spoken Italian: a corpus-based study, Oral presentation at the 10th International Conference on Language Variation in Europe (ICLAVE), Fryske Akademy, Leeuwarden/Ljouwert, Netherlands.
De Pascale S. 2019. Token-based vector space models as semantic control in lexical lectometry. KU Leuven: Unpublished doctoral dissertation
Geeraerts, D. 2017. Entrenchment as Onomasiological Salience. In H.-J. Schmid (Ed.), Entrenchment and the Psychology of Language Learning. (pp. 153–174). De Gruyter Mouton.
Geeraerts, D., Grondelaers, S., & D. Speelman. 1999. Convergentie en divergentie in de Nederlandse woordenschat: een onderzoek naar kleding- en voetbaltermen. P.J. Meertens-Instituut.
Guy G. & F. Hinskens. 2016. “Linguistic coherence: Systems, repertoires and speech communities”. Lingua 172-173: 1-9.
Grondelaers, S., R. Van Hout & D. Speelman. 2011. A perceptual typology of standard language situations in the Low Countries, 199-222. In T. Kristiansen & N. Coupland. Standard Languages and Language Standards in a Changing Europe. Oslo: Novus press.
Labov, W. 1969. Contraction, Deletion, and Inherent Variability of the English Copula. Language, 45(4), 715.
Mauri, C., S. Ballarè, E. Goria, M. Cerruti & F. Suriano. 2019. KIParla corpus: a new resource for spoken Italian. In Bernardi, R., R. Navigli & G. Semeraro (eds.), Proceedings of the 6th Italian Conference on Computational Linguistics CLiC-it.
Rossini Favretti R., F. Tamburini & C. De Santis. 2002. CORIS/CODIS: A corpus of written Italian based on a defined and a dynamic model. In Wilson, A., P. Rayson & T. McEnery (eds.), A Rainbow of Corpora: Corpus Linguistics and the Languages of the World, Lincom-Europa, Munich.
Wieling, M., S. Montemagni, J. Nerbonne & H. Baayen. 2014. Lexical differences between Tuscan dialects and standard Italian: A sociolinguistic analysis using generalized additive mixed modeling. Language 90(3). 669-692.

Stefania Marzo is Associate Professor of Italian Linguistics at the University of Leuven. Her research interests broadly fall into the area of variationist sociolinguistics and contact linguistics. She focuses specifically on the diffusion of urban vernaculars in Flanders, on contact and leveling in heritage Italian in Europe, and on the dynamics of (re)standardization in Italy. On a methodological level, she combines corpus-based production research with experimental methods in order to investigate the social meaning of language variation.