Linguistic Corpora at the HZSK Repository
Corpus type3general corpus
Selkup is an endangered Southern Samoyedic language (Uralic family). The INEL Selkup corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on Selkup in almost all regions where the Selkup people lived in 1962–1977. Most texts in the corpus originate from the handwritten part of the archive, the others come from sound recordings made by A.I. Kuzmina, transcribed and translated within the INEL project. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, annotation of borrowings. Some texts also have annotations for syntactic structure, semantic roles and information status.
License: CC BY-NC-SA 4.0 (public)
Subcorpus 1 presents part of the euroWiss-Corpus covering communication in teaching/learning discourses in instruction at German and Italian universities, in the humanities as well as the technical and natural sciences; it offers access to transcriptions of lectures and seminars aligned with audio recordings and the text types used for instruction. The corpus comprises 18 Communications, 24 audio recordings, 24 transcriptions, 140,000 transcribed words, 19 identified speakers, 18 students' notes, 2 lecture scripts, 24 chalkboard presentions, 2 powerpoint presentations, 3 overhead slides, 3 handouts, 14 schedules/descriptions of recorded lecture/seminar
Language: German, Italian
License: HZSK-ACA (academic)
Audio recordings of three German/Spanish simultaneous bilingual children starting at approx. 1 year and ending at 2 or 3 years. There are 20-50 recording sessions (interviewer/child interaction) per child, half of them conducted in German and half in Spanish.
Language: German, Spanish
License: HZSK-RES (restricted)