Linguistic Corpora at the HZSK Repository

The digital repository of the Hamburger Zentrum für Sprachkorpora stores and disseminates linguistic resources and tools. Further information can be found here:

Searched: English
X
Hits: 8
http://hdl.handle.net/11022/0000-0000-4F70-A
general corpus / spoken / discourse

EXMARaLDA Demo Corpus 1.0

A selection of short audio and video recordings in various languages to be used for instruction or demonstration of the EXMARaLDA system.

Language: German, English, French, Spanish, Turkish, Polish, Vietnamese, Swedish, Norwegian, Italian, Russian, Afrikaans, Portuguese

License: HZSK-PUB (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C641-0
general corpus / spoken / encyclopedia

The Spoken Wikipedia Corpora

The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are – for one reason or another – unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.

Language: English, German, Dutch

License: Creative Commons Attribution-ShareAlike 4.0 International (public)

Open lock icon indicates accessible resource
Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-BFF2-1
comparable corpus / written / popular science texts

Covert translation: popular science

Translation corpora of original texts with translations and comparable texts from the genre popular scientific prose.

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C2E7-9
general corpus / written / discourse

Covert translation: Business Communication (old)

Translation corpora of original texts with translations and comparable texts from the genre external business communication

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C2EF-1
general corpus / written / business communication

Covert translation: Business Communication (new)

Translation corpora of original texts with translations and comparable texts from the genre external business communication.

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-D4C0-0
general corpus / spoken / discourse

Türkisch-Englisch-Deutsch bei Herkunftssprechern (TEDH)

The TEDH has been created as part of the project "Foreign Language Acquisition in German-Turkish bilinguals". The TEDH Corpus contains interviews in three languages: Turkish, English, German. The corpus contains 74 communications from 25 different speakers. The bulk of the language material to be integrated, glossed and annotated has been collected by several researchers and is available in audio format. The transcription data as well as the metadata of the corpus are processed and stored in EXMARaLDA format.

Language: German, Turkish, English

License: HZSK-ACA (academic)

Open lock icon indicates accessible resource
SSO icon indicates single sign-on resource
http://hdl.handle.net/11022/0000-0000-51E4-3
general corpus / spoken / discourse

Community Interpreting Database Pilot Corpus (ComInDat)

Audio and video recordings of various types of community interpreted discourse (doctor-patient communication, simulated doctor-patient communication, courtroom communication) in German (simulated and authentic doctor-patient communication) and US (courtroom communication) institutions with varying community languages. Video recordings only exist for the simulated communication. For the authentic interpreted doctor-patient communication, no audio files will be made available.

Language: German, English, Spanish, Turkish, Polish, Portuguese, Romanian, Russian, Haitian

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-9D16-7
general corpus / written / historic manuscript

Hamburg Corpus of Old Swedish with Syntactic Annotations (HaCOSSA)

Religious and secular prose, law texts, non-fiction literature (geographical, theological, historic, natural science), diploma.

Language: English, German, Latin, Old Swedish, Swedish

License: FID-AKA (restricted)

Closed lock icon indicates restricted resource
Download icon indicates downloads available for this resource