Linguistic Corpora at the HZSK Repository

The digital repository of the Hamburger Zentrum für Sprachkorpora stores and disseminates linguistic resources and tools. Further information can be found here:

License type

31academic
Searched: academic
X
Hits: 31
http://hdl.handle.net/11022/0000-0000-7FC7-2
treebank / written / newspaper article

Hamburg Dependency Treebank

The Hamburg Dependency Treebank is to our knowledge the largest dependency treebank currently available. It consists of genuine dependency annotations, i.e. they have not been transformed from phrase structures.

Language: German

License: HZSK-ACA (Text) / CC-by-sa-4.0 (Annotation) (academic)

Open lock icon indicates accessible resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0006-CD41-A
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0001-7DBA-2
general corpus / spoken / discourse

euroWiss - Linguistic Profiling of European Academic Education (Subcorpus 1)

Subcorpus 1 presents part of the euroWiss-Corpus covering communication in teaching/learning discourses in instruction at German and Italian universities, in the humanities as well as the technical and natural sciences; it offers access to transcriptions of lectures and seminars aligned with audio recordings and the text types used for instruction. The corpus comprises 18 Communications, 24 audio recordings, 24 transcriptions, 140,000 transcribed words, 19 identified speakers, 18 students' notes, 2 lecture scripts, 24 chalkboard presentions, 2 powerpoint presentations, 3 overhead slides, 3 handouts, 14 schedules/descriptions of recorded lecture/seminar

Language: German, Italian

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-6330-A
general corpus / spoken / discourse

The Hamburg MapTask Corpus (HAMATAC)

Audio and two video recordings of map tasks with adult L2 users of German and one L1 speaker. The speakers' L1 and their L2 proficiencies vary. The maps used for the tasks are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-6973-9
general corpus / spoken / discourse

Hamburg Modern Times Corpus (HaMoTiC)

Audio recordings of a film retelling task with adult L2 users of German. The speakers' L1 and their L2 proficiencies vary. 24 communications + 1 German reference communication, duration between 2 and 16 minutes. For each speaker, a language learner biography (audio and freely transcribes) is available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0007-C2EF-1
general corpus / written / business communication

Covert translation: Business Communication (new)

Translation corpora of original texts with translations and comparable texts from the genre external business communication.

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C2E7-9
general corpus / written / discourse

Covert translation: Business Communication (old)

Translation corpora of original texts with translations and comparable texts from the genre external business communication

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-BFF2-1
comparable corpus / written / popular science texts

Covert translation: popular science

Translation corpora of original texts with translations and comparable texts from the genre popular scientific prose.

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0001-B732-8
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing (KoLaS 2.0)

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource
http://hdl.handle.net/11022/0000-0001-B735-5
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing (KoLaS 1.1)

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource
http://hdl.handle.net/11022/0000-0001-B734-6
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing (KoLaS 1.0)

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource
http://hdl.handle.net/11022/0000-0007-CA47-6
general corpus / written / Ethiopic literature

TraCES

Language: Ethiopic

License: BY-NC-ND 4.0 (academic)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B27-6
general corpus / written / discourse

B2 Hausa

Hausa: complete set, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.

Language: Hausa

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B29-4
general corpus / spoken / discourse

B2 Bura

Full set: all focus related experiments, status: work in progress, large parts elicited, most of the data transcribed, partly annotated

Language: Bura

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-82AC-B
unknown / unknown / news articles

A5 Hausa News

This corpus of news articles from the online news service of Deutsche Welle contains 4 texts with a total of 2017 tokens.

Language: Hausa

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-82AD-A
unknown / spoken / discourse

A5 Hausa Umarnin Uwa

This corpus of Umarnin Uwa film transcripts contains 47 transcripts with a total of 10194 tokens. It provides information including automatic POS tagging, speaker and extralinguistic information, foreign words and code-switching.

Language: Hausa

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B1C-3
general corpus / spoken / discourse

B1 Aja

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.

Language: Aja

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B1D-2
general corpus / written / discourse

B7 Wolof (web)

The corpus comprises out of a collection of texts from discussion forums in the web, randomly chosen for their near-standard like orthography and language, and treating different topics. The texts are translated manually by a mother tongue speaker and automatically tagged by a part-of-speech tagger. No further annotation is provided.

Language: Wolof

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B2C-1
general corpus / spoken / discourse

B1 Fon

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.

Language: Fon

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B2B-2
general corpus / spoken / discourse

B1 Foodo

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.

Language: Foodo

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B2A-3
general corpus / spoken / discourse

B1 Yom

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.

Language: Yom

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B23-A
general corpus / written / historic manuscript

B4 Historisches Predigtenkorpus zum Nachfeld

HIPKON is the first corpus based on only one text type (sermons) and on one dialect area, Upper German (Bavarian-Alemannic). The sermons cover the time from Middle High German to the beginning of the New High German period. They were accurately selected so that each of them is representative of one century. Among others, syntax, information structure and discourse structure were annotated in the corpus.

Language: New High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B26-7
general corpus / spoken / discourse

B2 Marghi

Full set: all focus related experiments, status: work in progress, large parts elicited, most of the data transcribed, partly annotated.

Language: Marghi

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B2D-0
general corpus / written / wiki-article

B7 Wolof (Wikipedia)

The corpus comprises out of a collection of texts from the Wolof Wikipedia, randomly chosen for their near-standard like orthography and language, and treating different topics. The texts are translated manually by a mother tongue speaker and automatically tagged by a part-of-speech tagger. No further annotation is provided.

Language: Wolof

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B1F-0
general corpus / written / historic manuscript

B4 Sächsische Weltchronik

The corpus contains a chronic from the 13th century in Middle Low German.

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B20-D
general corpus / written / historic manuscript

B4 Otfrid

The reference corpus Old German contains (annotated) data from the oldest language monuments of German before the continuous written transduction around 750 until 1050 with approx. 650,000 text words.

Language: Old High German

License: Creative Commons Attribution 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B21-C
general corpus / written / historic manuscript

B4 Muspilli

Complete text, status: work in progress, digitalization, translation to English, manually annotated with parts of speech, syntactic category, grammatical function, clause status, numbers of syllables (per constituent), information status, topic/comment, position of constituent in sentence, definiteness, focus/background, focus marker, comments, source (bibliography).

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B22-B
general corpus / written / historic manuscript

B4 Ludolf

The texts of this corpus, Ludolf von Sudheims Reise ins Heilige Land (Ludolf of Sudheim’s Journey to the Holy Land), is a journey diary describing the adventures of a group of pilgrims, written in Middle Low German and dated back to 1350. For information on the properties of the text, including the manuscripts, see Blust-Thiele (1985). This corpus uses the text edition by Stapelmohr (1937). The first 20 pages of it are tagged for clause type and grammatical function. The corpus includes 6,690 tokens.

Language: German Middle Low

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B28-5
general corpus / written / discourse

B2 Guruntum

Guruntum sample: sample, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.

Language: Guruntum

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B25-8
general corpus / spoken / discourse

B2 Tangale

Tangale sample: sample, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.

Language: Tangale

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B24-9
general corpus / written / historic manuscript

B4 Heliand

Heliand 1, 4 and 5: complete text, status: final, digitalization, translation to Modern German, manually annotated with parts of speech, syntactic categories, grammatical functions, clause status, numbers of syllables (per constituent), alliteration, information status, topic/comment, position of phrase in sentence, definiteness, focus/background, focus-marker, comments on context, source (bibliography).

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource