Wablieft: An Easy-to-Read Newspaper Corpus for Dutch

Vincent Vandeghinste, Bram Bulté, Liesbeth Augustinus

Research output: Chapter in Book/Report/Conference proceedingConference paper

Abstract

This paper presents the Wablieft corpus, a two million words corpus of a Belgian easy-to-read newspaper, written in Dutch. The corpus was automatically annotated with CLARIN tools and is made available in several formats for download and online querying, through the CLARIN infrastructure. Annotations consist of part-of-speech tagging, chunking, dependency parsing, named entity recognition, morphological analysis and universal dependencies. By making this corpus available we want to stimulate research into text readability and automated text simplification.
Original languageEnglish
Title of host publicationProceedings of CLARIN Annual Conference 2019
EditorsKiril Simov, Maria Eskevich
Pages188-191
Publication statusPublished - 1 Oct 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'Wablieft: An Easy-to-Read Newspaper Corpus for Dutch'. Together they form a unique fingerprint.

Cite this