2022-04-13, 10:00–10:30 (Europe/Vienna), Room 4
A standing body of work (e.g. [1,2,3,4]) has compared different vowel-normalization methods for sociophonetic research. Except for , most of this prior research focused on steady-state vowels, taking e.g. the vowel midpoint as a single representative sample. In light of concerns about the validity of this approach ([2,5,6]) and related observations that even monophthongal phonemes can be phonetically time-dynamic ([2,6,7,8]), we revisit the issue of vowel-normalization techniques taking temporal dynamics into explicit consideration. Our comparison is based on ’s hand-corrected measurements of the Dutch ‘teacher corpus’ [9,10]. This provides word-list data of 160 speakers from 8 regions of Standard Dutch (four in The Netherlands and four in Flanders), stratified by sex (male/female) and age (old/young), making five individuals per cell. On the basis of these data, we compared sixteen contemporary normalization methods, as implemented in Visible Vowels (), and an unnormalized baseline. We compared performance along four domains:
(1) Normalizing anatomical variation, operationalized by the ability to recover the speaker sexes (cfr. [1,2]) and the individual speakers on the basis of the normalized F0—F3; (2) Retaining vowel distinctions and the participants’ regional origins in the normalized F0—F3; (3) The amount of variation in the normalized F0—F3 that could be explained by the separate factors of sex, vowel, region, and their interactions; (4) The total explained variance in the vowel trajectories per region, possibly including by-speaker random effects.
(1) and (2) were operationalized using multinomial GAMs (), modeling the vowels’ temporal trajectories using nonlinear smoothing splines through five timepoints (from 25% to 75% in equidistant steps; ), taking the classification accuracy as our metric of interest. (3) was operationalized using multivariate normal GAMs, where we computed partial-eta-squared values (cfr. [1,2]) for the eight factors of interest. (4) was operationalized using univariate GAMs, with R² as our metric of interest. For (4), we fitted models both with and without by-speaker random effects, expecting a major contribution of random effects for the unnormalized dataset, but a more modest contribution for the normalized data. We expected this because the normalized data should have eliminated some (anatomical) between-speaker variation, but the unnormalized data should leave this variation to the random effects.
Our results partly reproduce the observations by [1,2] that Lobanov, together with Gerstman and Nearey I, perform excellently on the classification tasks (domains (1) and (2)). However, other methods, particularly Heeringa & Van de Velde II, achieve higher effect sizes (domain (3)) at only marginally worse classification performance. In domain (4) we observe that all methods but one (Thomas & Kendall) accrue a similar benefit to the inclusion of random effects. Importantly, this includes the baseline, showing that rather than random effects replacing normalization, these two methodological decisions complement each other. Finally, we note a general advantage of log-mean and centroid normalization methods, which we interpret with reference to their consideration of temporal dynamics.
 Adank, P., Smits, R., & Hout, R. van (2004). A comparison of vowel normalization procedures for language variation research. The Journal of the Acoustical Society of America, 116(5), 3099-3107.
 Harst, S. van der (2011). The vowel space paradox: A sociophonetic study on Dutch. LOT.
 Hindle, D. (1978). Approaches to formant normalization in the study of natural speech. In D. Sankoff (ed.), Linguistic Variation: Models and Methods. Academic.
 Disner, S. F. (1980). Evaluation of vowel normalization procedures. The Journal of the Acoustical Society of America, 67(1), 253-261.
 Clopper, C. G., Pisoni, D. B., & De Jong, K. (2005). Acoustic characteristics of the vowel systems of six regional varieties of American English. The Journal of the Acoustical society of America, 118(3), 1661-1676.
 Hillenbrand, J. M., Clark, M. J., & Nearey, T. M. (2001). Effects of consonant environment on vowel formant patterns. The Journal of the Acoustical Society of America, 109(2), 748-763.
 Recasens, D., & Espinosa, A. (2009). Dispersion and variability in Catalan five and six peripheral vowel systems. Speech Communication, 51(3), 240-258.
 Heuvel, H. van den, Cranen, B., & Rietveld, T. (1996). Speaker variability in the coarticulation of /a,i,u/. Speech Communication, 18(2), 113-130.
 Adank, P. M. (2003). Vowel normalization: a perceptual-acoustic study of Dutch vowels. LOT.
 Hout, R. van, Schutter, G. de, Crom, E. de, Huinck, W., Kloots, H., & Van de Velde, H. (1999). De uitspraak van het Standaard-Nederlands. Variatie en varianten in Vlaanderen en Nederland. In Huls, E. & Weltens, B. (ed.), Artikelen van de derde sociolinguïstische conferentie. Eburon.
 Heeringa, W. & Van de Velde, H. (2018). Visible Vowels: a Tool for the Visualization of Vowel Variation. In Proceedings CLARIN Annual Conference 2018, 8-10 October, Pisa, Italy, pp. 120-123. CLARIN ERIC.
 Wood, S. N. (2017). Generalized additive models: an introduction with R (2nd edn). CRC press.
Hans Van de Velde is chair of sociolinguistics at Utrecht University and senior researcher at the Fryske Akademy in Leeuwarden/Ljouwert (the Netherlands), where he organised ICLaVE|10. He is a specialist in language variation and change, sociophonetics and standardization processes and worked a lot on regional variation in Dutch and on the characteristics of /r/. At the Fryske Akademy he researches the Frisian minority language, Dutch and the mixed varieties spoken in the Fryslân province. He is also responsible for the digital research infrastructure for Frisian and the development of Frisian language tools, such as the online Dutch-Frisian dictionary, spell checkers, automatic translation and speech recognition.