Linguistic Corpora at the HZSK Repository
euroWiss - Linguistic Profiling of European Academic Education (Subcorpus 1)
Subcorpus 1 presents part of the euroWiss-Corpus covering communication in teaching/learning discourses in instruction at German and Italian universities, in the humanities as well as the technical and natural sciences; it offers access to transcriptions of lectures and seminars aligned with audio recordings and the text types used for instruction. The corpus comprises 18 Communications, 24 audio recordings, 24 transcriptions, 140,000 transcribed words, 19 identified speakers, 18 students' notes, 2 lecture scripts, 24 chalkboard presentions, 2 powerpoint presentations, 3 overhead slides, 3 handouts, 14 schedules/descriptions of recorded lecture/seminar
Language: German, Italian
License: HZSK-ACA (academic)
The Hamburg MapTask Corpus (HAMATAC)
Audio recordings of map tasks with adult L2 users of German. The speakers´ L1 and their L2 proficiencies vary. The maps used for the tasks are available.; Audioaufnahmen von Map-Task-Aufgaben bei Erwachsenen mit Deutsch als Zweitsprache. Die Kompetenzen der Sprecher in Erst- und Zweitsprache variieren. Die in dieser Aufgabe benutzten Karten sind verfügbar.
Language: German
License: HZSK-ACA (academic)
Hamburg Modern Times Corpus (HaMoTiC)
Audio recordings of a film retelling task with adult L2 users of German. The speakers' L1 and their L2 proficiencies vary. 24 communications + 1 German reference communication, duration between 2 and 16 minutes. For each speaker, a language learner biography (audio and freely transcribes) is available.
Language: German
License: HZSK-ACA (academic)
A5 Hausa Umarnin Uwa
This corpus of Umarnin Uwa film transcripts contains 47 transcripts with a total of 10194 tokens. It provides information including automatic POS tagging, speaker and extralinguistic information, foreign words and code-switching.
Language: Hausa
License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)
B1 Aja
The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.
Language: Aja
License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)
B1 Fon
The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.
Language: Fon
License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)
B1 Foodo
The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.
Language: Foodo
License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)
B1 Yom
The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.
Language: Yom
License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)
B2 Bura
Full set: all focus related experiments, status: work in progress, large parts elicited, most of the data transcribed, partly annotated
Language: Bura
License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)
B2 Marghi
Full set: all focus related experiments, status: work in progress, large parts elicited, most of the data transcribed, partly annotated.
Language: Marghi
License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)
B2 Tangale
Tangale sample: sample, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.
Language: Tangale
License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)