Linguistic Corpora at the HZSK Repository
Corpus type2general corpus
A selection of short audio and video recordings in various languages to be used for instruction or demonstration of the EXMARaLDA system.
Language: German, English, French, Spanish, Turkish, Polish, Vietnamese, Swedish, Norwegian, Italian, Russian, Afrikaans, Portuguese
License: HZSK-PUB (public)
The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are – for one reason or another – unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.
Language: English, German, Dutch
License: Creative Commons Attribution-ShareAlike 4.0 International (public)