A corpus for research on diatopic variation in Standard German: Design and user interface of the ZDL-Regionalkorpus
2022-04-13, 14:30–15:00 (Europe/Vienna), Room 2


The ZDL-Regionalkorpus (Nolda, Barbaresi & Geyken 2021) is a newspaper corpus designed for research on diatopic variation in Standard German. It is hosted by the Zentrum für digitale Lexikographie der deutschen Sprache (ZDL) at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) and is used by lexicographers of the online German dictionary “Digitales Wörterbuch der deutschen Sprache” (DWDS) as an empirical base for the geographic labels in DWDS articles (cf. https://www.dwds.de/d/regionalangaben). The ZDL-Regionalkorpus and its powerful user interface are also available for any registered DWDS user (https://www.dwds.de/d/korpora/regional; registration is free of charge).
Currently, the ZDL-Regionalkorpus comprises about 31 million articles with ca. 9 billion tokens from local and regional sections of daily newspapers from various regions in Germany. As in the non-public (and much smaller) corpus of the “Variantengrammatik” project (http://mediawiki.ids-mannheim.de/VarGra/index.php/Datenerhebung), articles from other newspaper sections have been excluded, since they are often provided by central editorial departments or by news agencies. An extension to Austrian newspapers is in preparation in cooperation with the Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH) at the Austrian Academy of Sciences (ÖAW). In addition, future versions of the ZDL-Regionalkorpus will also provide data from Swiss newspapers.
The user interface of the ZDL-Regionalkorpus provides a faceted search over diatopic areas, ranging from D-Nordwest to D-Südost. The areal classification is modelled on the “Variantengrammatik” and the “Variantenwörterbuch” (Ammon et al. 2016), but also takes empirical findings of Lameli (2013) into account. Corpus queries can use the full power of the DWDS corpus engine (https://www.dwds.de/d/korpussuche), including queries for lemmas, word forms, phrases, and parts of speech. Frequencies per area and newspaper are presented in tabular and cartographic form. An interactive histogram view plots frequencies per area and year.
This hands-on presentation of the design and the user interface of the ZDL-Regionalkorpus is aimed at both novice and advanced DWDS users. A focus of the presentation will be on new data and tools which were introduced only recently. For the presentation, we need one or two tables for laptops and printed material, power outlets, and preferably one flat-screen per table.


Ammon, Ulrich et al. 2016. Variantenwörterbuch des Deutschen: Die Standardsprache in Österreich, der Schweiz und Deutschland, Liechtenstein, Luxemburg, Ostbelgien und Südtirol sowie Rumänien, Namibia und Mennonitensiedlungen. 2nd ed. Berlin: Walter de Gruyter.
Lameli, Alfred. 2013. Strukturen im Sprachraum: Analysen zur arealtypologischen Komplexität der Dialekte in Deutschland (Linguistik – Impulse & Tendenzen 54). Berlin: de Gruyter.
Nolda, Andreas, Adrien Barbaresi & Alexander Geyken. 2021. Das ZDL-Regionalkorpus: Ein Korpus für die lexikografische Beschreibung der diatopischen Variation im Standarddeutschen. In Lobin, Henning, Andreas Witt & Angelika Wöllstein (eds.), Deutsch in Europa: Sprachpolitisch, grammatisch, methodisch (Institut für Deutsche Sprache: Jahrbuch 2020), 317–322, Berlin: de Gruyter.