Linguistic Corpora at the HZSK Repository
Corpus type3general corpus
A selection of short audio and video recordings in various languages to be used for instruction or demonstration of the EXMARaLDA system.
Language: German, English, French, Spanish, Turkish, Polish, Vietnamese, Swedish, Norwegian, Italian, Russian, Afrikaans, Portuguese
License: HZSK-PUB (public)
The Nganasan Spoken Language Corpus (NSLC) has been created as part of Corpus based grammatical studies on Nganasan project (supported by the German Research Grant; WA3153/2-1). The Spoken Nganasan Corpus contains the same text samples in at least three languages: The original text in Nganasan with translations mostly into Russian and English, sometimes also into German. The corpus contains 55 communications from 15 different speakers. The bulk of the language material to be integrated, glossed and annotated has been collected by several researchers and is available in audio format. The transcription data as well as the metadata of the corpus are processed and stored in EXMARaLDA format.
Language: Nganasan, Russian
License: HZSK-ACA (academic)
Audio and video recordings of various types of community interpreted discourse (doctor-patient communication, simulated doctor-patient communication, courtroom communication) in German (simulated and authentic doctor-patient communication) and US (courtroom communication) institutions with varying community languages. Video recordings only exist for the simulated communication. For the authentic interpreted doctor-patient communication, no audio files will be made available.
Language: German, English, Spanish, Turkish, Polish, Portuguese, Romanian, Russian, Haitian
License: HZSK-RES (restricted)