Linguistic Corpora at the HZSK Repository
Selkup Language Corpus (SLC)
The Selkup Language Corpus has been created within the project Syntactic description of the Central and Southern Selkup dialects: a corpus based analyses (supported by the German Research Grant; WA 3153/3-1). The primary goal of the project is to build a corpus and research syntactic structures on its base. The corpus contains 144 texts already published in written form with glosses and annotations. All texts have been translated into English, and mostly into Russian and German. The corpus also contains rich metadata on the communications and speakers. The transcription data as well as the metadata of the corpus are processed and stored in EXMARaLDA format.
Language: Selkup, Russian
License: CC BY-NC-SA 4.0 (public)
Nganasan Spoken Language Corpus (NSLC)
The Nganasan Spoken Language Corpus (NSLC) has been created as part of Corpus based grammatical studies on Nganasan project (supported by the German Research Grant; WA3153/2-1). The Spoken Nganasan Corpus contains the same text samples in at least three languages: The original text in Nganasan with translations mostly into Russian and English, sometimes also into German. The corpus contains 55 communications from 15 different speakers. The bulk of the language material to be integrated, glossed and annotated has been collected by several researchers and is available in audio format. The transcription data as well as the metadata of the corpus are processed and stored in EXMARaLDA format.
Language: Nganasan, Russian
License: CC BY-NC-SA 4.0 (public)