The Corpus „Deutsch Heute“ in the DGD

Over 1000 hours of speech were recorded by the project "Variation des gesprochenen Deutsch" ("Variation of spoken German") and informed already numerous studies (Sloos 2013, Kisler and Schiel 2018, Hahn and Siebenhaar 2016, 2019, Kleiner 2015) on the regional ‘standard usage’ of German ("Gebrauchsstandard", Deppermann et al. 2013). With the curation of the corpus "Deutsch Heute" ("German Today", DH) – in the Archive for Spoken German (AGD) and its dissemination via the Database for Spoken German (DGD, dgd.ids-mannheim.de), this big corpus, rich in metadata and almost completely transcribed, is now being made available online to the scientific community.
The recording area covers the entire contiguous German-speaking regions including Germany, Austria, Switzerland, East-Belgium, Luxemburg, Liechtenstein and South Tyrol. In 195 places, high-school graduates and students of an adult education centre were recorded in high audio quality (Brinckmann et al. 2008).
Building on the experiences with the corpus "Deutsche Standardsprache" ("German Standard Language" aka "König-Korpus", König 1989), the recordings of "Deutsch Heute" include both spontaneous speech such as language-biographic interviews and maptasks, and prompted speech elicited through reading tasks, picture-naming or translations. Looking at the landscape of German oral corpora, DH is in a line with other variation corpora such as "Deutsche Mundarten" ("German Dialects" aka Zwirner-Korpus (Zwirner and Bethge 1958)) or "Deutsche Umgangssprachen" ("German Colloquial Speech" aka Pfeffer-Korpus (Pfeffer and Lohnes 1984), all of which are also available via the DGD. With its design, focussing on systematic regional variation and a few highly controlled speech types, DH can also be seen in a complementary relationship with the Forschungs- und Lehrkorpus Gesprochenes Deutsch ("Research and Teaching Corpus of Spoken German", FOLK, Schmidt 2014) whose design aims at maximal variation across natural interaction types, but is less systematic in its stratification of speaker properties. Taken together, DH and FOLK open new possibilities for combining variational and interactional linguistics.
Each of the 10 recording situations in DH, that every participant went through (resulting in 90 minutes per speaker), is rich enough to count as a self-contained corpus. For example, the text "Nordwind und Sonne" ("The North Wind and The Sun"), was once read with normal and once with fast speed, and build the basis for speech-rate analyses in the SpuRD project (Hahn and Siebenhaar 2016, 2019). Other parts of the corpus build the basis for the commented maps of the AADG (Atlas zur Aussprache des deutschen Gebrauchsstandards), which are also available online (http://prowiki.ids-mannheim.de/bin/view/AADG/).
In a dedicated curation effort over the last years, metadata, recordings and transcripts of DH were transformed to suit the DGD, quality-controlled on several levels and further enriched with lemmatisation and POS tagging.
Our poster will give an overview of the data and metadata of the DH corpus and demonstrate example queries in the DGD, exploiting the corpus’ main characteristics: regional balance, rich metadata, a variety of recording situations and the large amounts of data: over 6 million tokens.


References
  • Brinckmann, Caren, Stefan Kleiner, Ralf Knöbl & Nina Berend. 2008. German Today: An areally extensive corpus of spoken Standard German. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC’08). Marrakesch, Marokko. European Language Resources Association (ELRA), 3185-3191.
  • Coene, Martine, Annemiek Hammer, Wojtek Kowalczyk, Louis ten Bosch, Bart Vaerenberg & Paul Govaerts. 2013. Quantifying cross-linguistic variation in grapheme-to-phoneme mapping. In Proceedings of Interspeech 2013. Lyon, France, 1854-1857.
  • Deppermann, Arnulf, Stefan Kleiner & Ralf Knöbl. 2013. 'Standard usage': Towards a realistic conception of spoken standard German. In: Auer, Peter, Javier Caro Reina & Göz Kaufmann (eds.), Language Variation - European Perspectives IV. Selected papers from the Sixth International Conference on Language Variation in Europe (ICLaVE 6), Freiburg, June 2011, 83-116. Amsterdam/Philadelphia: Benjamins (SILV 14).
  • Hahn, Matthias & Beat Siebenhaar. 2016. Sprechtempo und Reduktion im Deutschen (SpuRD). In Jokisch, Oliver (ed.), Elektronische Sprachsignalverarbeitung 2016 (ESSV2016). Dresden: TUDpress, 198–205.
  • Hahn, Matthias & Beat Siebenhaar. 2019. Spatial Variation of Articulation Rate and Phonetic Reduction in Standard-Intended German. In Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS), Melbourne, Australia, 2695–2699.
  • Hansen-Morath, Sandra, Anja Geumann & Renate Raffelsiefen. 2019. Vergleich der Quantität, Qualität und Dynamik in den deutschen -Lauten. In Proceedings of the Conference on Phonetics & Phonology in German-speaking Countries (P&P 13). Berlin, Germany, 77-80.
  • Kisler, Thomas & Florian Schiel. 2018. Towards a speaker localization from spontaneous speech: North-south classification for speakers of contemporary german. In Elektronische Sprachsignalverarbeitung 2018 (ESSV2018), 200-207.
  • Kleiner, Stefan. 2015. "Deutsch heute" und der Atlas zur Aussprache des deutschen Gebrauchsstandards. In Kehrein, Roland, Alfred Lameli &Stefan Rabanus (eds.), Regionale Variation des Deutschen. Projekte und Perspektiven, 489-518. Berlin: De Gruyter.
  • König, Werner. 1989. Atlas zur Aussprache des Schriftdeutschen in der Bundesrepublik Deutschland. 2 volumes. Ismaning: Hueber.
  • Pfeffer, J. Alan & Walter F. W. Lohnes. 1984. Grunddeutsch. Texte zur gesprochenen deutschen Gegenwartssprache. (Phonai, vol. 28-30), Tübingen: Niemeyer.
  • Schmidt, Thomas. 2014. The Research and Teaching Corpus of Spoken German – FOLK. In Proceedings of the Ninth conference on International Language Resources and Evaluation (LREC’14), 383-387. European Language Resources Association (ELRA).
  • Sloos, Marjoleine. 2013. The reversal of the BÄREN/BEEREN merger in Austrian Standard German. In Gonia Jarema and Gary Libben (eds.), Phonological and Phonetic Considerations of Lexical Processing. Thematic Issue of The Mental Lexicon 8:3. John Benjamins.
  • Zwirner, Eberhard & Wolfgang Bethge. 1958. Erläuterungen zu den Texten, volume 1 of Lautbibliothek der deutschen Mundarten. Göttingen: Vandenhoeck & Ruprecht.
See also: Poster