The Corpus „Deutsch Heute“ in the DGD

Over 1000 hours of speech were recorded by the project "Variation des gesprochenen Deutsch" ("Variation of spoken German") and informed already numerous studies (Sloos 2013, Kisler and Schiel 2018, Hahn and Siebenhaar 2016, 2019, Kleiner 2015) on the regional ‘standard usage’ of German ("Gebrauchsstandard", Deppermann et al. 2013). With the curation of the corpus "Deutsch Heute" ("German Today", DH) – in the Archive for Spoken German (AGD) and its dissemination via the Database for Spoken German (DGD,, this big corpus, rich in metadata and almost completely transcribed, is now being made available online to the scientific community.
The recording area covers the entire contiguous German-speaking regions including Germany, Austria, Switzerland, East-Belgium, Luxemburg, Liechtenstein and South Tyrol. In 195 places, high-school graduates and students of an adult education centre were recorded in high audio quality (Brinckmann et al. 2008).
Building on the experiences with the corpus "Deutsche Standardsprache" ("German Standard Language" aka "König-Korpus", König 1989), the recordings of "Deutsch Heute" include both spontaneous speech such as language-biographic interviews and maptasks, and prompted speech elicited through reading tasks, picture-naming or translations. Looking at the landscape of German oral corpora, DH is in a line with other variation corpora such as "Deutsche Mundarten" ("German Dialects" aka Zwirner-Korpus (Zwirner and Bethge 1958)) or "Deutsche Umgangssprachen" ("German Colloquial Speech" aka Pfeffer-Korpus (Pfeffer and Lohnes 1984), all of which are also available via the DGD. With its design, focussing on systematic regional variation and a few highly controlled speech types, DH can also be seen in a complementary relationship with the Forschungs- und Lehrkorpus Gesprochenes Deutsch ("Research and Teaching Corpus of Spoken German", FOLK, Schmidt 2014) whose design aims at maximal variation across natural interaction types, but is less systematic in its stratification of speaker properties. Taken together, DH and FOLK open new possibilities for combining variational and interactional linguistics.
Each of the 10 recording situations in DH, that every participant went through (resulting in 90 minutes per speaker), is rich enough to count as a self-contained corpus. For example, the text "Nordwind und Sonne" ("The North Wind and The Sun"), was once read with normal and once with fast speed, and build the basis for speech-rate analyses in the SpuRD project (Hahn and Siebenhaar 2016, 2019). Other parts of the corpus build the basis for the commented maps of the AADG (Atlas zur Aussprache des deutschen Gebrauchsstandards), which are also available online (
In a dedicated curation effort over the last years, metadata, recordings and transcripts of DH were transformed to suit the DGD, quality-controlled on several levels and further enriched with lemmatisation and POS tagging.
Our poster will give an overview of the data and metadata of the DH corpus and demonstrate example queries in the DGD, exploiting the corpus’ main characteristics: regional balance, rich metadata, a variety of recording situations and the large amounts of data: over 6 million tokens.

