Linguistic Corpora at the HZSK Repository
Hamburg Dependency Treebank
The Hamburg Dependency Treebank is to our knowledge the largest dependency treebank currently available. It consists of genuine dependency annotations, i.e. they have not been transformed from phrase structures.
Language: German
License: HZSK-ACA (Text) / CC-by-sa-4.0 (Annotation) (academic)
Commented Learner Corpus Academic Writing
Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.
Language: German
License: HZSK-ACA (academic)
euroWiss - Linguistic Profiling of European Academic Education (Subcorpus 1)
Subcorpus 1 presents part of the euroWiss-Corpus covering communication in teaching/learning discourses in instruction at German and Italian universities, in the humanities as well as the technical and natural sciences; it offers access to transcriptions of lectures and seminars aligned with audio recordings and the text types used for instruction. The corpus comprises 18 Communications, 24 audio recordings, 24 transcriptions, 140,000 transcribed words, 19 identified speakers, 18 students' notes, 2 lecture scripts, 24 chalkboard presentions, 2 powerpoint presentations, 3 overhead slides, 3 handouts, 14 schedules/descriptions of recorded lecture/seminar
Language: German, Italian
License: HZSK-ACA (academic)
The Hamburg MapTask Corpus (HAMATAC)
Audio recordings of map tasks with adult L2 users of German. The speakers´ L1 and their L2 proficiencies vary. The maps used for the tasks are available.; Audioaufnahmen von Map-Task-Aufgaben bei Erwachsenen mit Deutsch als Zweitsprache. Die Kompetenzen der Sprecher in Erst- und Zweitsprache variieren. Die in dieser Aufgabe benutzten Karten sind verfügbar.
Language: German
License: HZSK-ACA (academic)
Hamburg Modern Times Corpus (HaMoTiC)
Audio recordings of a film retelling task with adult L2 users of German. The speakers' L1 and their L2 proficiencies vary. 24 communications + 1 German reference communication, duration between 2 and 16 minutes. For each speaker, a language learner biography (audio and freely transcribes) is available.
Language: German
License: HZSK-ACA (academic)
Covert translation: popular science
Translation corpora of original texts with translations and comparable texts from the genre popular scientific prose.
Language: German, English
License: HZSK-ACA (academic)
Covert translation: Business Communication (old)
Translation corpora of original texts with translations and comparable texts from the genre external business communication
Language: German, English
License: HZSK-ACA (academic)
Covert translation: Business Communication (new)
Translation corpora of original texts with translations and comparable texts from the genre external business communication.
Language: German, English
License: HZSK-ACA (academic)