Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch

Iris Van De Voorde, Gijsbert Rutten, Rik Vosters, Marijke van der Wal, Wim Vandenbussche

Research output: Contribution to journalArticlepeer-review

16 Downloads (Pure)

Abstract

In this contribution, we present the Historical Corpus of Dutch (HCD), a new multi-genre, diachronic corpus of Early and Late Modern Dutch (ca.1550-1850). It consists of a digitised collection of handwritten administrative texts (e.g. town council meeting reports), handwritten ego-documents (e.g. diaries and travelogues), and printed pamphlets (e.g. of a political or religious nature). The corpus is also balanced between northern and southern material, with data from the provinces of Holland and Zeeland for the North, and from Flanders and Brabant for the South. After having discussed its structure and composition, we will illustrate the value of the new corpus with a number of smaller case studies. Based on our experiences with the corpus, we will conclude by launching a plea for historical corpus building not to focus too much on the quantity of data (‘big data’), but rather shift attention to data quality. 

Original languageEnglish
Pages (from-to)114–132
Number of pages19
JournalTaal en Tongval
Volume75
Issue number1
DOIs
Publication statusPublished - 2023

Keywords

  • historical corpus building
  • history of Dutch
  • corpus linguistics
  • northern and southern Dutch
  • spelling of long a
  • d- and w-forms

Fingerprint

Dive into the research topics of 'Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch'. Together they form a unique fingerprint.

Cite this