Linguistic Corpora at the HZSK Repository

The digital repository of the Hamburger Zentrum für Sprachkorpora stores and disseminates linguistic resources and tools. Further information can be found here:

Searched: written
X
Hits: 28
http://hdl.handle.net/11022/0000-0000-7FC7-2
treebank / written / newspaper article

Hamburg Dependency Treebank

The Hamburg Dependency Treebank is to our knowledge the largest dependency treebank currently available. It consists of genuine dependency annotations, i.e. they have not been transformed from phrase structures.

Language: German

License: HZSK-ACA (Text) / CC-by-sa-4.0 (Annotation) (academic)

Open lock icon indicates accessible resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0006-473B-9

Referenzkorpus Mittelniederdeutsch/Niederrheinisch (1200-1650)

The reference corpus of Middle Low German and Low Rhenish texts is based on manuscripts, prints and inscriptions. It is intended to provide an insight into the culture of speech and writing in Middle Low German and Low Rhenish regions. This spectrum of texttypes can be used to trace the linguistic development on the base of diatopic and diacronic subcategorisation. The aim of the project is the publication of diplomatic transcribed, lemmatised and grammatically annotated texts. The processed data – especially on the grammatical level – enables a linguistic analysis of the Middle Low German and Low Rhenish language, which goes far beyond what was possible until now.

Language: Undefined

License: CC-BY 4.0 (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C4B1-3

Referenzkorpus Mittelniederdeutsch/Niederrheinisch (1200-1650)

The reference corpus of Middle Low German and Low Rhenish texts is based on manuscripts, prints and inscriptions. It is intended to provide an insight into the culture of speech and writing in Middle Low German and Low Rhenish regions. This spectrum of texttypes can be used to trace the linguistic development on the base of diatopic and diacronic subcategorisation. The aim of the project is the publication of diplomatic transcribed, lemmatised and grammatically annotated texts. The processed data – especially on the grammatical level – enables a linguistic analysis of the Middle Low German and Low Rhenish language, which goes far beyond what was possible until now.

Language: Undefined

License: CC-BY 4.0 (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C2FA-4

Referenzkorpus Mittelniederdeutsch/Niederrheinisch (1200-1650)

The reference corpus of Middle Low German and Low Rhenish texts is based on manuscripts, prints and inscriptions. It is intended to provide an insight into the culture of speech and writing in Middle Low German and Low Rhenish regions. This spectrum of texttypes can be used to trace the linguistic development on the base of diatopic and diacronic subcategorisation. The aim of the project is the publication of diplomatic transcribed, lemmatised and grammatically annotated texts. The processed data – especially on the grammatical level – enables a linguistic analysis of the Middle Low German and Low Rhenish language, which goes far beyond what was possible until now.

Language: Undefined

License: CC-BY 4.0 (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-CF62-2

Referenzkorpus Mittelniederdeutsch/Niederrheinisch (1200-1650)

The reference corpus of Middle Low German and Low Rhenish texts is based on manuscripts, prints and inscriptions. It is intended to provide an insight into the culture of speech and writing in Middle Low German and Low Rhenish regions. This spectrum of texttypes can be used to trace the linguistic development on the base of diatopic and diacronic subcategorisation. The aim of the project is the publication of diplomatic transcribed, lemmatised and grammatically annotated texts. The processed data – especially on the grammatical level – enables a linguistic analysis of the Middle Low German and Low Rhenish language, which goes far beyond what was possible until now.

Language: Undefined

License: CC-BY 4.0 (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C64C-5

Referenzkorpus Mittelniederdeutsch/Niederrheinisch (1200-1650)

The reference corpus of Middle Low German and Low Rhenish texts is based on manuscripts, prints and inscriptions. It is intended to provide an insight into the culture of speech and writing in Middle Low German and Low Rhenish regions. This spectrum of texttypes can be used to trace the linguistic development on the base of diatopic and diacronic subcategorisation. The aim of the project is the publication of diplomatic transcribed, lemmatised and grammatically annotated texts. The processed data – especially on the grammatical level – enables a linguistic analysis of the Middle Low German and Low Rhenish language, which goes far beyond what was possible until now.

Language: Middle Low German, Low Rhenish

License: CC-BY 4.0 (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-CA03-2

Referenzkorpus Mittelniederdeutsch/Niederrheinisch (1200-1650)

The reference corpus of Middle Low German and Low Rhenish texts is based on manuscripts, prints and inscriptions. It is intended to provide an insight into the culture of speech and writing in Middle Low German and Low Rhenish regions. This spectrum of texttypes can be used to trace the linguistic development on the base of diatopic and diacronic subcategorisation. The aim of the project is the publication of diplomatic transcribed, lemmatised and grammatically annotated texts. The processed data – especially on the grammatical level – enables a linguistic analysis of the Middle Low German and Low Rhenish language, which goes far beyond what was possible until now.

Language: Undefined

License: CC-BY 4.0 (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0006-CD41-A
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C2EF-1
general corpus / written / business communication

Covert translation: Business Communication (new)

Translation corpora of original texts with translations and comparable texts from the genre external business communication.

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C2E7-9
general corpus / written / discourse

Covert translation: Business Communication (old)

Translation corpora of original texts with translations and comparable texts from the genre external business communication

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-BFF2-1
comparable corpus / written / popular science texts

Covert translation: popular science

Translation corpora of original texts with translations and comparable texts from the genre popular scientific prose.

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B23-A
general corpus / written / historic manuscript

B4 Historisches Predigtenkorpus zum Nachfeld

HIPKON is the first corpus based on only one text type (sermons) and on one dialect area, Upper German (Bavarian-Alemannic). The sermons cover the time from Middle High German to the beginning of the New High German period. They were accurately selected so that each of them is representative of one century. Among others, syntax, information structure and discourse structure were annotated in the corpus.

Language: New High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0001-B735-5
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing (KoLaS 1.1)

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource
http://hdl.handle.net/11022/0000-0000-9B27-6
general corpus / written / discourse

B2 Hausa

Hausa: complete set, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.

Language: Hausa

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B24-9
general corpus / written / historic manuscript

B4 Heliand

Heliand 1, 4 and 5: complete text, status: final, digitalization, translation to Modern German, manually annotated with parts of speech, syntactic categories, grammatical functions, clause status, numbers of syllables (per constituent), alliteration, information status, topic/comment, position of phrase in sentence, definiteness, focus/background, focus-marker, comments on context, source (bibliography).

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B21-C
general corpus / written / historic manuscript

B4 Muspilli

Complete text, status: work in progress, digitalization, translation to English, manually annotated with parts of speech, syntactic category, grammatical function, clause status, numbers of syllables (per constituent), information status, topic/comment, position of constituent in sentence, definiteness, focus/background, focus marker, comments, source (bibliography).

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B22-B
general corpus / written / historic manuscript

B4 Ludolf

The texts of this corpus, Ludolf von Sudheims Reise ins Heilige Land (Ludolf of Sudheim’s Journey to the Holy Land), is a journey diary describing the adventures of a group of pilgrims, written in Middle Low German and dated back to 1350. For information on the properties of the text, including the manuscripts, see Blust-Thiele (1985). This corpus uses the text edition by Stapelmohr (1937). The first 20 pages of it are tagged for clause type and grammatical function. The corpus includes 6,690 tokens.

Language: German Middle Low

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B1E-1
general corpus / written / religious text

B4 Tatian Corpus of Deviating Examples 2.1

The present corpus, the Tatian Corpus of Deviating Examples T-CODEX 2.1, provides morpho-syntactic and information structural annotation of parts of the Old High German translation attested in the MS St. Gallen Cod. 56, traditionally called the OHG Tatian, one of the largest prose texts from the classical OHG period. This corpus was designed and annotated by Project B4 of Collaborative Research Center on Information Structure at Humboldt University Berlin. The present corpus compiles ca. 2.000 deviating examples found in the text portions of the scribes α, β, γ and ε. Each clause structure represents an extra file annotated with the annotation tool EXMARaLDA and searchable via ANNIS, a general-purpose tool for the publication, visualisation and querying of linguistic data collections, developed by Project D1 of the Collaborative Research Center on Information Structure at Potsdam University.

Language: Latin, Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (public)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0001-B734-6
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing (KoLaS 1.0)

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource
http://hdl.handle.net/11022/0000-0000-9B2D-0
general corpus / written / wiki-article

B7 Wolof (Wikipedia)

The corpus comprises out of a collection of texts from the Wolof Wikipedia, randomly chosen for their near-standard like orthography and language, and treating different topics. The texts are translated manually by a mother tongue speaker and automatically tagged by a part-of-speech tagger. No further annotation is provided.

Language: Wolof

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B28-5
general corpus / written / discourse

B2 Guruntum

Guruntum sample: sample, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.

Language: Guruntum

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B1D-2
general corpus / written / discourse

B7 Wolof (web)

The corpus comprises out of a collection of texts from discussion forums in the web, randomly chosen for their near-standard like orthography and language, and treating different topics. The texts are translated manually by a mother tongue speaker and automatically tagged by a part-of-speech tagger. No further annotation is provided.

Language: Wolof

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0001-B732-8
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing (KoLaS 2.0)

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource
http://hdl.handle.net/11022/0000-0000-9B1F-0
general corpus / written / historic manuscript

B4 Sächsische Weltchronik

The corpus contains a chronic from the 13th century in Middle Low German.

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9D16-7
general corpus / written / historic manuscript

Hamburg Corpus of Old Swedish with Syntactic Annotations (HaCOSSA)

Religious and secular prose, law texts, non-fiction literature (geographical, theological, historic, natural science), diploma.

Language: English, German, Latin, Old Swedish, Swedish

License: FID-AKA (restricted)

Closed lock icon indicates restricted resource
Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-D0F1-D
general corpus / written / historic manuscript

Hamburg Old Scandinavian Text Collection (HOSTCol)

Law texts, chap books, miscellaneous literature in Old Swedish and Old Danish.

Language: Old Swedish, Old Danish

License: FID-AKA (restricted)

Closed lock icon indicates restricted resource
http://hdl.handle.net/11022/0000-0007-CA47-6
general corpus / written / Ethiopic literature

TraCES

Corpus of the Classical Ethiopic Language (Ge'ez), produced by the TraCES project (https://www.traces.uni-hamburg.de/en/about.html) in 2014-2019. The corpus is morphologically annotated and freely accessible for online search. The current corpus is a beta test run and should be treated as work in progress, as annotation has been carried to a varying degree of detail.

Language: Ethiopic

License: BY-NC-ND 4.0 (academic)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B20-D
general corpus / written / historic manuscript

B4 Otfrid

The reference corpus Old German contains (annotated) data from the oldest language monuments of German before the continuous written transduction around 750 until 1050 with approx. 650,000 text words.

Language: Old High German

License: Creative Commons Attribution 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource