The competition between the Dutch V1 and syndetic conditionals


References

Brysbaert, Marc, Michael Stevens, Simon De Deyne, Wouter Voorspoels & Gert Storms. 2014. Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta psychologica 150C. 80–84. https://doi.org/10.1016/j.actpsy.2014.04.010.

Keuleers, Emmanuel, Marc Brysbaert & Boris New. 2010. SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles. Behavior Research Methods 42(3). 643–650. https://doi.org/10.3758/BRM.42.3.643.

Mandera, Paweł, Emmanuel Keuleers & Marc Brysbaert. 2017. Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language 92. 57–78. https://doi.org/10.1016/j.jml.2016.04.001.

Piersoul, Jozefien, Robbert De Troij & Freek Van De Velde. 2021. 150 years of written Dutch: The construction of the Dutch Corpus of Contemporary and Late Modern Periodicals. Nederlandse Taalkunde 26(3). 339–362. https://doi.org/10.5117/NEDTAA2021.3.002.PIER.

Stefanowitsch, Anatol & Stefan Th. Gries. 2003. Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8(2). 209–243. https://doi.org/10.1075/ijcl.8.2.03ste.

Abstract

We consider a pair of near-synonymous syntactic variants in Dutch, namely V1 conditionals (1) vs. syndetic (2) conditionals. These two variants have coexisted for some time as a result of diachronic developments; the usage of the V1 conditionals has decreased in favor of the newer syndetic conditionals. We investigate whether the choice between the two variants can be statistically predicted.

(1) Schijnt de zon, dan gaan we naar het strand.

Shines the sun, then go we to the beach

‘Should the sun shine, we go to the beach.’

(2) Als de zon schijnt, dan gaan we naar het strand.

If the sun shines, then go we to the beach

‘If the sun shines, we go to the beach.’

Our data are taken from the CCLAMP-corpus (Piersoul, De Troij & Van De Velde 2021), which comprises 200 million tokens of written Dutch ranging from 1837 to 1999. The corpus is balanced for both region (Belgium and The Netherlands) and genre (literary and cultural magazines). An initial corpus search identified 280,698 sentence-initial verbs as well as 55,003 sentence-initial als conjunctions. Following a process of sampling and automated and manual filtering of the instances, we ultimately retained 3,867 V1 conditionals and 3,443 syndetic conditionals.

We coded the 7,310 conditional subordinate clauses for the following variables:
- Type of conditional: V1 vs. syndetic
- Integration into the main clause: integrated, resumptive, non-integrated
- Epistemic modal: present vs. absent
- Tense: non-past vs. past
- Subject animacy: human, concrete, abstract, rest
- Year: the year of attestation

We show the differences between the two types of conditionals with traditional logistic regression as well as with semantic vectors and multidimensional scaling. The first analysis consists of a generalized linear model with a logit link function. The dependent variable is the type of conditional, which is regressed on integration, epistemic modal, tense and subject animacy. Each of these variables is brought in interaction with the year of attestation.

For the second analysis, we carried out two collexeme analyses (Stefanowitsch & Gries 2003) on the two types of conditionals, using the log-likelihood metric for ranking the collexemes. We ran two separate collexeme analyses, allowing us to see both the differences in the verbs that are attracted to the two constructions and the extent of overlap in the two constructions. Subsequently, we made a two-dimensional multidimensional scaling of the semantic vectors (Mandera, Keuleers & Brysbaert 2017) of the top-100 collexemes for the two conditional constructions, and then looked for clustering of the type of conditional, the frequency of the verb (Keuleers, Brysbaert & New 2010), and the concreteness of the verb (Brysbaert et al. 2014).

We find that V1 conditionals exhibit reduced integration, a preference for epistemic modals, lower animacy in subjects, and a preference for abstract and infrequent verbs. These properties, which serve as indicators of their semantics, suggest refunctionalization to convey tentative or counterfactual meanings. Language users, driven by a tendency to avoid similarity, tend to notice subtle distinctions that likely stem from historical distributional differences. As the new variant (syndetic conditional) establishes itself in prototypical contexts, the displaced variant (V1 conditional) becomes associated with a specialized niche.