Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch

Iris Van De Voorde, Gijsbert Rutten, Rik Vosters, Marijke van der Wal, Wim Vandenbussche

Research output: Contribution to journalArticlepeer-review

16 Downloads (Pure)


In this contribution, we present the Historical Corpus of Dutch (HCD), a new multi-genre, diachronic corpus of Early and Late Modern Dutch (ca.1550-1850). It consists of a digitised collection of handwritten administrative texts (e.g. town council meeting reports), handwritten ego-documents (e.g. diaries and travelogues), and printed pamphlets (e.g. of a political or religious nature). The corpus is also balanced between northern and southern material, with data from the provinces of Holland and Zeeland for the North, and from Flanders and Brabant for the South. After having discussed its structure and composition, we will illustrate the value of the new corpus with a number of smaller case studies. Based on our experiences with the corpus, we will conclude by launching a plea for historical corpus building not to focus too much on the quantity of data (‘big data’), but rather shift attention to data quality. 

Original languageEnglish
Pages (from-to)114–132
Number of pages19
JournalTaal en Tongval
Issue number1
Publication statusPublished - 2023


  • historical corpus building
  • history of Dutch
  • corpus linguistics
  • northern and southern Dutch
  • spelling of long a
  • d- and w-forms


Dive into the research topics of 'Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch'. Together they form a unique fingerprint.

Cite this