Linguistische Korpora im HZSK Repository

Das Digitale Repositorium des Hamburger Zentrum für Sprachkorpora ist zuständig für die Speicherung und Auslieferung linguistischer Ressourcen und Werkzeuge. Weitere Informationen finden Sie hier:

Schlüsselwort

23EXMARaLDA
11focus
9L2 data
8L1 data
8adult bilingualism
...
Treffer: 50
http://hdl.handle.net/11022/0000-0000-4F70-A
general corpus / spoken / discourse

EXMARaLDA Demo Corpus 1.0

A selection of short audio and video recordings in various languages to be used for instruction or demonstration of the EXMARaLDA system.

Language: German, English, French, Spanish, Turkish, Polish, Vietnamese, Swedish, Norwegian, Italian, Russian, Afrikaans, Portuguese

License: HZSK-PUB (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-7FC7-2
treebank / written / newspaper article

Hamburg Dependency Treebank

The Hamburg Dependency Treebank is to our knowledge the largest dependency treebank currently available. It consists of genuine dependency annotations, i.e. they have not been transformed from phrase structures.

Language: German

License: HZSK-ACA (Text) / CC-by-sa-4.0 (Annotation) (academic)

Open lock icon indicates accessible resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C64C-5

Referenzkorpus Mittelniederdeutsch/Niederrheinisch (1200-1650)

The reference corpus of Middle Low German and Low Rhenish texts is based on manuscripts, prints and inscriptions. It is intended to provide an insight into the culture of speech and writing in Middle Low German and Low Rhenish regions. This spectrum of texttypes can be used to trace the linguistic development on the base of diatopic and diacronic subcategorisation. The aim of the project is the publication of diplomatic transcribed, lemmatised and grammatically annotated texts. The processed data – especially on the grammatical level – enables a linguistic analysis of the Middle Low German and Low Rhenish language, which goes far beyond what was possible until now.

Language: Middle Low German, Low Rhenish

License: CC-BY 4.0 (public)

Open lock icon indicates accessible resource
CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0006-CD41-A
learner corpus / written / academic writing

Commented Learner Corpus Academic Writing

Authentic texts written by students of the University of Hamburg as part of their studies, the students have various L1 languages and study various subjects, all of the texts were subject of a writing counseling at the Writing Center Multilingualism (Schreibwerkstatt Mehrsprachigkeit), for some of the texts comments by peer tutors and several versions are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0001-7DBA-2
general corpus / spoken / discourse

euroWiss - Linguistic Profiling of European Academic Education (Subcorpus 1)

Subcorpus 1 presents part of the euroWiss-Corpus covering communication in teaching/learning discourses in instruction at German and Italian universities, in the humanities as well as the technical and natural sciences; it offers access to transcriptions of lectures and seminars aligned with audio recordings and the text types used for instruction. The corpus comprises 18 Communications, 24 audio recordings, 24 transcriptions, 140,000 transcribed words, 19 identified speakers, 18 students' notes, 2 lecture scripts, 24 chalkboard presentions, 2 powerpoint presentations, 3 overhead slides, 3 handouts, 14 schedules/descriptions of recorded lecture/seminar

Language: German, Italian

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-6330-A
general corpus / spoken / discourse

The Hamburg MapTask Corpus (HAMATAC)

Audio and two video recordings of map tasks with adult L2 users of German and one L1 speaker. The speakers' L1 and their L2 proficiencies vary. The maps used for the tasks are available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-6973-9
general corpus / spoken / discourse

Hamburg Modern Times Corpus (HaMoTiC)

Audio recordings of a film retelling task with adult L2 users of German. The speakers' L1 and their L2 proficiencies vary. 24 communications + 1 German reference communication, duration between 2 and 16 minutes. For each speaker, a language learner biography (audio and freely transcribes) is available.

Language: German

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0007-C2EF-1
general corpus / written / business communication

Covert translation: Business Communication (new)

Translation corpora of original texts with translations and comparable texts from the genre external business communication.

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C2E7-9
general corpus / written / discourse

Covert translation: Business Communication (old)

Translation corpora of original texts with translations and comparable texts from the genre external business communication

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-BFF2-1
comparable corpus / written / popular science texts

Covert translation: popular science

Translation corpora of original texts with translations and comparable texts from the genre popular scientific prose.

Language: German, English

License: HZSK-ACA (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-50DD-D
general corpus / spoken / discourse

ALCEBLA

Audio recordings in Spanish with 23 German/Spanish simultaneous bilingual children living in Germany and attending the Spanish complementary school at the first level. 1-6 recordings with each child, with 11 children also before the children attended the Spanish complementary school. All recordings feature elicited speech: A picture naming task, a story telling task, a morphosyntactic test, a lexical test, and the HAVAS 5. Rich metadata on language use and attitudes in the family submitted by the parents.

Language: German, Spanish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-772F-7
general corpus / spoken / discourse

Catalan in a bilingual context (PhonCAT)

Audio recordings of prompted, read and spontaneous speech data from L1 Catalan speakers from Barcelona. The data is stratified according to three different city districts and three age groups. Speakers' age vary from approx. 5 to 45 years.

Language: Catalan

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-51E4-3
general corpus / spoken / discourse

Community Interpreting Database Pilot Corpus (ComInDat)

Audio and video recordings of various types of community interpreted discourse (doctor-patient communication, simulated doctor-patient communication, courtroom communication) in German (simulated and authentic doctor-patient communication) and US (courtroom communication) institutions with varying community languages. Video recordings only exist for the simulated communication. For the authentic interpreted doctor-patient communication, no audio files will be made available.

Language: German, English, Spanish, Turkish, Polish, Portuguese, Romanian, Russian, Haitian

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-5225-A
general corpus / spoken / discourse

Consecutive and Simultaneous Interpreting (CoSi)

Audio and video recordings of three lectures in Portuguese, one simultaneously and two consecutively professionally interpreted into German. For the simultaneouly interpreted lecture there are different recordings and transcriptions for the participants.

Language: German, Portuguese

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-523B-2
general corpus / spoken / discourse

Dolmetschen im Krankenhaus (DiK)

Audio recordings of various kinds of doctor-patient communication in hospitals. There are both monolingual conversations in German, Portuguese and Turkish, recorded in the respective country, and interpreted conversations recorded in Germany (i.e. in German-Turkish, German-Portuguese, and German-Portuguese/Spanish), about 15-20 recordings of each kind. The persons interpreting are bilingual hospital employees or relatives of the patients, who are all adults living in Germany but with varying knowledge of German.

Language: German, Portuguese, Spanish, Turkish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-A0D3-C
general corpus / spoken / discourse

Faroese Danish Corpus Hamburg 0.2.dan (FADAC-0.2.dan Hamburg)

Audio recordings of semi-structured interviews with bilingual speakers (aged 16-89 years) from various geographical areas on the Faroe Islands. For 37 of the 56 subjects there are recordings in both their L1 Faroese and their L2 Danish. Only the Danish data is available.

Language: Danish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-5C64-9
general corpus / spoken / discourse

Hamburg Adult Bilingual LAnguage (HABLA)

Audio recordings (semi-spontaneous interviews) with German/Italian and German/French bilingual speakers aged approx. 15-55 years at the recording sessions. The simultaneous bilinguals with German and French/Italian as L1s have been recorded twice, i.e. once for each language. The successive bilinguals with German as L1 and French/Italian as L2, or French/Italian as L1 and German as L1 all have AOAs between 11 and 38 years and have been recorded using their L2.

Language: German, French, Italian

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-5F0B-B
general corpus / spoken / discourse

Hamburg Corpus of Argentinean Spanish (HaCASpa)

Audio and video recordings of experimental/read and spontaneous speech from adult speakers of Porteño Spanish in Argentina. Speakers are 18-69 years old and from two geographic areas. For the intonational experiments, there are audio recordings only, whereas some of the free interviews and map tasks feature video recordings. The material used as stimuli in the experiments is available with references encoded in the transcriptions.

Language: Spanish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-9D16-7
general corpus / written / historic manuscript

Hamburg Corpus of Old Swedish with Syntactic Annotations (HaCOSSA)

Religious and secular prose, law texts, non-fiction literature (geographical, theological, historic, natural science), diploma.

Language: English, German, Latin, Old Swedish, Swedish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-63CE-9
general corpus / spoken / discourse

Hamburg Corpus of Polish in Germany (HamCoPoliG)

Audio recordings of German/Polish bilingual and Polish monolingual adults (16-46 years). Recordings of semi-spontaneous data (3 topics) and renarration of a picture story.

Language: Polish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0007-C6F2-8
general corpus / spoken / flkd: folklore texts, Dyurimi

Nganasan Spoken Language Corpus (NSLC)

The Nganasan Spoken Language Corpus (NSLC) has been created as part of Corpus based grammatical studies on Nganasan project (supported by the German Research Grant; WA3153/2-1). The Spoken Nganasan Corpus contains the same text samples in at least three languages: The original text in Nganasan with translations mostly into Russian and English, sometimes also into German. The corpus contains 55 communications from 15 different speakers. The bulk of the language material to be integrated, glossed and annotated has been collected by several researchers and is available in audio format. The transcription data as well as the metadata of the corpus are processed and stored in EXMARaLDA format.

Language: Nganasan, Russian

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-69DD-2
general corpus / spoken / discourse

Parameterfixierung im Deutschen und Spanischen (PAIDUS)

Audio recordings of five German and five Spanish speaking monolingual children. For the German children there are about 30 recordings (interviewer/child interaction) per child, on an average starting at 9 months and ending at 3 years; for the Spanish children there are on average 15 recordings per child ending at 2 years.

Language: German, Spanish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-70CA-E
general corpus / spoken / discourse

PhonBLA Longitudinalstudie Hamburg

Audio and Video recordings of four German/Spanish bilingual children starting at approx. 1 year and 6 months and ending at age 6-7 years with about 100 recordings (interviewer/child interaction) of each child, half of them in each language.

Language: German, Spanish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-6ECE-E
general corpus / spoken / discourse

Phonologie-Erwerb Deutsch-Spanisch als Erste Sprachen (PEDSES)

Audio recordings of three German/Spanish simultaneous bilingual children starting at approx. 1 year and ending at 2 or 3 years. There are 20-50 recording sessions (interviewer/child interaction) per child, half of them conducted in German and half in Spanish.

Language: German, Spanish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-7D27-9
general corpus / spoken / discourse

Phon-CL2

Audio recordings of 15 German subjects in Spain (5 to 36 years old) with Spanish as L2 and AOA > 2 years. Recording sessions in Spanish based on picture naming and story telling etc. Rich metadata on language use and attitudes in the family submitted by the parents.

Language: German, Spanish

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0003-BDFA-F
general corpus / spoken / discourse

Scandinavian Semicommunication in Radio Programmes

Bilingual radio broadcasts of Scandinavian speakers interacting using their respective languages. The speakers have Danish, Norwegian or Swedish as L1 and varying receptive knowledge of the other languages.

Language: Danish, Norwegian, Swedish

License: FID-AKA (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0003-C011-0
general corpus / spoken / discourse

Scandinavian Semicommunication in the Oeresund Region

Bilingual radio broadcasts of Scandinavian speakers interacting using their respective languages. Most speakers have Danish or Swedish as L1 and varying receptive knowledge of the other languages and live in the Oeresund region.

Language: Danish, Swedish

License: FID-AKA (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-7EE3-3
general corpus / spoken / discourse

Sprachvariation in Norddeutschland (SiN)

Audio recordings of adult speaker of Northern German varieties.

Language: German, Low German

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
http://hdl.handle.net/11022/0000-0000-7D90-1
general corpus / spoken / discourse

TÜ_DE-cL2-Korpus

Video recordings in German of eight bilingual children with L1 Turkish and L2 German with AOA of 3-4 years. Several recordings of spontaneous speech (play) during 7-28 months at ages approx. 3-6,5 years, and of elicited language with focus on article usage. Comparable data for the TÜ_DE-L1-Korpus.

Language: German

License: HZSK-RES (restricted)

Closed lock icon indicates restricted resource
CLARIN icon indicates integration into CLARIN Eye icon indicates online browsable resource
http://hdl.handle.net/11022/0000-0000-9B1E-1
general corpus / written / religious text

B4 Tatian Corpus of Deviating Examples 2.1

The present corpus, the Tatian Corpus of Deviating Examples T-CODEX 2.1, provides morpho-syntactic and information structural annotation of parts of the Old High German translation attested in the MS St. Gallen Cod. 56, traditionally called the OHG Tatian, one of the largest prose texts from the classical OHG period. This corpus was designed and annotated by Project B4 of Collaborative Research Center on Information Structure at Humboldt University Berlin. The present corpus compiles ca. 2.000 deviating examples found in the text portions of the scribes α, β, γ and ε. Each clause structure represents an extra file annotated with the annotation tool EXMARaLDA and searchable via ANNIS, a general-purpose tool for the publication, visualisation and querying of linguistic data collections, developed by Project D1 of the Collaborative Research Center on Information Structure at Potsdam University.

Language: Latin, Old High German

License: Creative Commons Attribution 3.0 Unported License (public)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B23-A
general corpus / written / historic manuscript

B4 Historisches Predigtenkorpus zum Nachfeld

HIPKON is the first corpus based on only one text type (sermons) and on one dialect area, Upper German (Bavarian-Alemannic). The sermons cover the time from Middle High German to the beginning of the New High German period. They were accurately selected so that each of them is representative of one century. Among others, syntax, information structure and discourse structure were annotated in the corpus.

Language: New High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0007-C641-0
general corpus / spoken / encyclopedia

The Spoken Wikipedia Corpora

Language: English, German, Dutch

License: Creative Commons Attribution-ShareAlike 4.0 International (public)

Closed lock icon indicates restricted resource
Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-82AC-B
unknown / unknown / news articles

A5 Hausa News

This corpus of news articles from the online news service of Deutsche Welle contains 4 texts with a total of 2017 tokens.

Language: Hausa

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-82AD-A
unknown / spoken / discourse

A5 Hausa Umarnin Uwa

This corpus of Umarnin Uwa film transcripts contains 47 transcripts with a total of 10194 tokens. It provides information including automatic POS tagging, speaker and extralinguistic information, foreign words and code-switching.

Language: Hausa

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B1C-3
general corpus / spoken / discourse

B1 Aja

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.

Language: Aja

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B2D-0
general corpus / written / wiki-article

B7 Wolof (Wikipedia)

The corpus comprises out of a collection of texts from the Wolof Wikipedia, randomly chosen for their near-standard like orthography and language, and treating different topics. The texts are translated manually by a mother tongue speaker and automatically tagged by a part-of-speech tagger. No further annotation is provided.

Language: Wolof

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B2C-1
general corpus / spoken / discourse

B1 Fon

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.

Language: Fon

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B1F-0
general corpus / written / historic manuscript

B4 Sächsische Weltchronik

The corpus contains a chronic from the 13th century in Middle Low German.

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B20-D
general corpus / written / historic manuscript

B4 Otfrid

The reference corpus Old German contains (annotated) data from the oldest language monuments of German before the continuous written transduction around 750 until 1050 with approx. 650,000 text words.

Language: Old High German

License: Creative Commons Attribution 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B21-C
general corpus / written / historic manuscript

B4 Muspilli

Complete text, status: work in progress, digitalization, translation to English, manually annotated with parts of speech, syntactic category, grammatical function, clause status, numbers of syllables (per constituent), information status, topic/comment, position of constituent in sentence, definiteness, focus/background, focus marker, comments, source (bibliography).

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B22-B
general corpus / written / historic manuscript

B4 Ludolf

The texts of this corpus, Ludolf von Sudheims Reise ins Heilige Land (Ludolf of Sudheim’s Journey to the Holy Land), is a journey diary describing the adventures of a group of pilgrims, written in Middle Low German and dated back to 1350. For information on the properties of the text, including the manuscripts, see Blust-Thiele (1985). This corpus uses the text edition by Stapelmohr (1937). The first 20 pages of it are tagged for clause type and grammatical function. The corpus includes 6,690 tokens.

Language: German Middle Low

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B24-9
general corpus / written / historic manuscript

B4 Heliand

Heliand 1, 4 and 5: complete text, status: final, digitalization, translation to Modern German, manually annotated with parts of speech, syntactic categories, grammatical functions, clause status, numbers of syllables (per constituent), alliteration, information status, topic/comment, position of phrase in sentence, definiteness, focus/background, focus-marker, comments on context, source (bibliography).

Language: Old High German

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B25-8
general corpus / spoken / discourse

B2 Tangale

Tangale sample: sample, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.

Language: Tangale

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B26-7
general corpus / spoken / discourse

B2 Marghi

Full set: all focus related experiments, status: work in progress, large parts elicited, most of the data transcribed, partly annotated.

Language: Marghi

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B27-6
general corpus / written / discourse

B2 Hausa

Hausa: complete set, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.

Language: Hausa

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B28-5
general corpus / written / discourse

B2 Guruntum

Guruntum sample: sample, status: final, manually transcribed, glossed and translated to English, annotated wrt. morphology, parts of speech, syntax, gramm. function, sem. roles, focus and focus position (e.g. ex situ) in EXMARaLDA.

Language: Guruntum

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B1D-2
general corpus / written / discourse

B7 Wolof (web)

The corpus comprises out of a collection of texts from discussion forums in the web, randomly chosen for their near-standard like orthography and language, and treating different topics. The texts are translated manually by a mother tongue speaker and automatically tagged by a part-of-speech tagger. No further annotation is provided.

Language: Wolof

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B2A-3
general corpus / spoken / discourse

B1 Yom

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.

Language: Yom

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B2B-2
general corpus / spoken / discourse

B1 Foodo

The data sets for each language consist of a small number of mini-dialogues, chosen out of the 189 entries within the Focus Translation Task (cf. Skopeteas et al. 2006: 209ff.) in order to get a basic set of utterances for comparison between the languages dealt with in the project.

Language: Foodo

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource
http://hdl.handle.net/11022/0000-0000-9B29-4
general corpus / spoken / discourse

B2 Bura

Full set: all focus related experiments, status: work in progress, large parts elicited, most of the data transcribed, partly annotated

Language: Bura

License: Creative Commons Attribution-NonCommercial 3.0 Unported License (academic)

Closed lock icon indicates restricted resource
SSO icon indicates single sign-on resource Download icon indicates downloads available for this resource