About the HZSK Repository
The digital repository of the HZSK was created to support archiving, maintenance, distribution, and exploitation of spoken language corpora. These corpora usually include audio and/or video recordings, transcriptions, additional data and structured metadata.
Primarily focussing on the topic 'Multilingualism', this collection of corpora is made freely available for scientific research and teaching. However, depending on the corpus, individual registration might be necessary.
Belonging to the CLARIN group, the repository meets the criteria of the CLARIN Center Assessment as well as the Data Seal of Approval. This means:
- The data of the HZSK repository are unambiguous and persistent to identify and quote (handle system) (oder versions remain accessible, see guidelines for versioning).
- The individual corpora can be searched via the Federated Content Search of the Virtual Language Observatory (VLO)
- Single-Sign On via Shibboleth (CLARIN IdP) is possible
- The metadata of every HZSK corpora listed in the present repository are made searchable via the OAI PMH Metadata Harvesting of the CLARIN language resources catalog
You can find the precise Technical Documentation here.
The HZSK repository emerged as part of the projects “CLARIN” – funded by the BMBF- and “LIS” – funded by the DFG - between 2011 and 2013 at the University of Hamburg.
Corpora of the SFB 538 "Multilingualism"
At the SFB 538 ‘Multilingualism’, a variety of corpora were created, documenting multilingual communication (e.g. interpreting), language development of multilingual speakers (e.g. language acquisition, language attrition) and aspects of social, individual and historical multilingualism.
Corpora of the SFB 632 "Information structure"
Many corpora of the Collaborative Research Center 632 (Sonderforschungsbereich / SFB 632) "Information Structure: The Linguistic Means of Structuring Utterances, Sentences and Texts" (funded by the DFG between July 2003 and June 2015) have been incorporated into the HZSK Repository.
The following internal documents contain information about the technical implementation and guidelines of the HZSK:
- Technical Report (English)
- Technischer Leitfaden (German)
- Corpus Release Guidelines (English)
- Richtlinien zur Korpusfreigabe (German)
- Richtlinien zur Versionierung (German)
Hedeland, Hanna; Jettka, Daniel & Lehmberg, Timm (2014). Vernetzung statt Vereinheitlichung. Digitale Forschungsinfrastrukturen in den Geisteswissenschaften. In b.i.t. online. Vol. 17, No. 5.
Jettka, Daniel & Stein, Daniel (2014). The HZSK Repository: Implementation, Features, and Use Cases of a Repository for Spoken Language Corpora. In D-Lib Magazine. Vol. 20, No. 9/10. DOI: 10.1045/september2014-jettka
Windhouwer, Menzo; Kemps-Snijders, Marc; Trilsbeek, Paul; Moreira, André; van der Veen, Bas; Silva, Guilherme & von Reihn, Daniel (2016). FLAT: Constructing a CLARIN Compatible Home for Language Resources. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk and Stelios Piperidis (eds.). Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 23.-28.05.2016. Portorož, Slovenia. ISBN: 978-2-9517408-9-1
Yeh, Shea-Tinn; Reyes, Fernando; Rynhart, Jeff & Bain, Philip (2016). Deploying Islandora as a Digital Repository Platform: a Multifaceted Experience at the University of Denver Libraries. In D-Lib Magazine. Vol. 22, No. 7/8. DOI: 10.1045/july2016-yeh