What can big data tell us about the social meaning of language variation? A case study on socially meaningful spelling variation in English iclave11

What can big data tell us about the social meaning of language variation? A case study on socially meaningful spelling variation in English
.ical

2022-04-14, 10:00–10:30 (Europe/Vienna), Room 3

https://univienna.zoom.us/j/62425851719

Spelling variation is abundant in written language use on social media platforms. Crucially, many of the non-conventional spellings that can be found there are not misspellings. Various studies have analyzed the patterns and functions of online spelling variation (e.g. Tatman, 2015 and Ilbury, 2020 for Twitter), and have suggested a strong connection between phonological and orthographic variation (Eisenstein, 2015). This suggests that spelling variation can be used like other forms of linguistic variation to express aspects of the language user’s social identity (Sebba, 2007). Yet, little quantitative research has been carried out on the social meanings of spelling variants. This study aims to contribute to tackling this descriptive lacuna in sociolinguistic research. We set out to do so by comparing the social meanings of spelling variants, elicited through human experiments, to data-driven meaning representations, automatically learnt from large corpora. As such, this study supplements its descriptive research aim with a methodological one: to what extent can traditional sociolinguistic ‘small data’ and recent NLP based ‘big data’ approaches complement each other?

In this paper, we focus on spelling variation on the popular online platform Twitter. We look at two types of spelling variation phenomena in British English: (1) spelling variation representing phonetic variation (e.g. alveolar vs. velar pronunciation of ING as in workin vs. working), and (2) spelling variation restricted to the orthographic level (e.g. flooding of characters as in fun vs. funnnn).

First, the social meaning of the linguistic variants is measured experimentally in a written version of the speaker evaluation paradigm (cf. Leigh 2018). Stimuli for this experiment are selected from a Twitter corpus controlled for region (i.e. the London metropolitan area). A sample of participants (N = 120) geographically matched to the producers of the corpus data is presented with a series of tweets containing the linguistic variants under study. Participants are asked to rate the personality of the writer on a series of semantic differential scales representing various social traits that have been shown to be potentially associated with the social meanings of the targeted linguistic variation.

Second, we compare our experimental measurements of social meanings with word embeddings, i.e. automatically learnt mappings from words to high-dimensional vectors based on co-occurrences in the Twitter corpus (Mikolov et al. 2013; Smith 2020). We do so using a computational analysis of the linguistic variants in the embedding space, by measuring the distances between the variants and clustering them based on their embeddings. For example, are linguistic variants that received similar ratings in the human experiments clustered together in the embedding space?

Our paper brings novel insights into the social meaning of spelling variation. It furthermore draws attention to opportunities and limitations of data-driven meaning representations for sociolinguistic research on language variation.

References –

Eisenstein, Jacob. 2015. Systematic patterning in phonologically-motivated orthographic variation. Journal of Sociolinguistics 19(2): 161-188.
Ilbury, Christian. 2020. “Sassy Queens”: Stylistic orthographic variation in Twitter and the enregisterment of AAVE. Journal of Sociolinguistics 24(2): 245-264
Leigh, Daisy. 2018. Expecting a performance: Listener expectations of social meaning in social media. Paper presented at NWAV43 [New Ways of Analyzing Variation], New York, 20 October 2018.
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado & Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems: 3111–3119.
Tatman, Rachel. 2015. #go awn: Sociophonetic Variation in Variant Spellings on Twitter. Working Papers of the Linguistics Circle of the University of Victoria 25(2).
Sebba, Mark. 2007. Spelling and society: The culture and politics of orthography around the world. Cambridge: CUP.
Smith, Noah A. 2020. Contextual word representations: Putting words into computers. Communications of the ACM 63(6):66-74.

Dong Nguyen

Laura Rosseel

Laura Rosseel is assistant professor in Dutch lanuage and linguistics at Vrije Universiteit Brussel. Her research interests mainly lie in the fields of (developmental) sociolinguistics, language variation and change, and experimental linguistics. Her PhD research focused on innovating the measurement of the social meaning of language variation. More specifically, she studied on a number of implicit attitude measures recently developed in social psychology and investigated whether it is possible to adapt these new attitude measures and use them to study language attitudes. In her current work, Laura is further applying these new methods to study the social meaning of language variation in various speech communities, as well as to measure the acquisition of language attitudes in children and adults.

This speaker also appears in:

Does 'he dived' take longer than 'he dove'? An experimental approach to iconicity in past tense morphology.

What can big data tell us about the social meaning of language variation? A case study on socially meaningful spelling variation in English .ical 2022-04-14, 10:00–10:30 (Europe/Vienna), Room 3

What can big data tell us about the social meaning of language variation? A case study on socially meaningful spelling variation in English
.ical

2022-04-14, 10:00–10:30 (Europe/Vienna), Room 3