This paper presents the Wablieft corpus, a two million words corpus of a Belgian easy-to-read newspaper, written in Dutch. The corpus was automatically annotated with CLARIN tools and is made available in several formats for download and online querying, through the CLARIN infrastructure. Annotations consist of part-of-speech tagging, chunking, dependency parsing, named entity recognition, morphological analysis and universal dependencies. By making this corpus available we want to stimulate research into text readability and automated text simplification.
|Title of host publication||Proceedings of CLARIN Annual Conference 2019|
|Editors||Kiril Simov, Maria Eskevich|
|Publication status||Published - 1 Oct 2019|