Variation with and without prescriptivism: The case of the Greek word for ‘coronavirus’
With the advent of the covid-19 pandemic, several Greek linguists and folk-linguists raised the issue: which form of the Greek loanword for ‘coronavirus’ is the “correct” one. We have consulted 74 metalinguistic texts on this issue, all published online over the period Jan. 22 – April 3, 2020. Four variants have been prescribed (κορον-ο-ϊός, κορων-ο-ϊός [koronoiós], κορον-α-ϊός, and κορων-α-ϊός [koronaiós]), varying wrt 1. the spelling of the first compound (<o> or <ω>) and 2. the linking ‘thematic vowel’ (-o- or -a-).
Our approach improves on the two-corpora design described by Auer (2006). We have traced the use of the prescribed variants in multiple monitor corpora consisting mainly of texts published in news websites, mined with web scraping techniques and analyzed using NLP libraries in a Python programming environment. Monitor corpora allow the study of prescriptivism in (almost) real time, providing valuable insights into its workings. Phase 1 of our corpus (May 2013 and April-May 2014) consists of 71,343 texts (totaling 21,807,666 words), drawn from 6 news websites; the references in this period are to other corona viruses such as MERS-CoV and SARS-CoV; variation during this phase has been unattended by prescriptivists. Phase 2 (December 2019 - April 2020), the focus of our study, consists of a total of 123,250 texts (42,204,247 words), drawn from the same 6 news websites; finally, Phase 3 consists of 20,706 texts (7,037,279 words) drawn from a much larger corpus of 18 websites at a later period of time (May 26 – June 1, 2020). We have also analyzed a Phase 3 corpus of 12,039 tweets and a Phase 3 corpus of radio broadcasts (total of 19h 21m).
Our statistical analysis shows:
1. a radical shift in usage between Phase 1 and Phase 2, suggesting a strong influence of prescriptivism.
2. Phase 2 and Phase 3 variation is increasingly compartmentalized (consistent use ranging from 68.18% to 88.84% in Phase 2 and from 73.76% % to 98.44% in Phase 3).
3. The variants with ‘thematic’ -o- are preferred over variants with -a- (87.91% vs. 12.07% respectively in Phase 3). The thematic -o- prevails in radio broadcasts (97,8%) and also in neological compounds (analysis in progress).
Our analysis supports the following claims:
a. Metalinguistic discourse is itself variable. Although prescriptivism aims at eliminating variation, it could also introduce or foster some variation, albeit in a highly compartmentalized manner.
b. The more “effective” prescriptions seem to be the ones that are compatible with an already established usage trend; but for this same reason these prescriptions could be considered to be vacuous or redundant.


