Samenvatting
We study highly granular dialect normalization and phonological dialect translation on Limburgish, a non-standardized low-resource language with a wide variation in spelling conventions and phonology. We find improvements to the traditional transformer by embedding the geographic coordinates of dialects in dialect normalization tasks and use these geographically-embedded transformers to translate words between the phonologies of different dialects. These results are found to be consistent with notions in traditional Limburgish dialectology.
Originele taal-2 | English |
---|---|
Titel | VarDial 2024 - 11th Workshop on NLP for Similar Languages, Varieties and Dialects, Proceedings of the Workshop |
Redacteuren | Yves Scherrer, Tommi Jauhiainen, Nikola Ljubesic, Marcos Zampieri, Preslav Nakov, Jorg Tiedemann |
Plaats van productie | Mexico City |
Uitgeverij | Association for Computational Linguistics (ACL) |
Pagina's | 152-162 |
Aantal pagina's | 11 |
ISBN van elektronische versie | 9798891761049 |
ISBN van geprinte versie | 9798891761049 |
Status | Published - 2024 |
Evenement | 11th Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2024 - Mexico City, Mexico Duur: 20 jun. 2024 → … |
Publicatie series
Naam | VarDial 2024 - 11th Workshop on NLP for Similar Languages, Varieties and Dialects, Proceedings of the Workshop |
---|
Conference
Conference | 11th Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2024 |
---|---|
Land/Regio | Mexico |
Stad | Mexico City |
Periode | 20/06/24 → … |
Bibliografische nota
Publisher Copyright:© 2024 Association for Computational Linguistics.
Datasets
-
Dataset for LimburgishNLP projects
Simons, A. (Creator), De Pascale, S. (Supervisor) & Franco, K. (Supervisor), GitHub, 2024
https://github.com/AndreasJCSimons/LimburgishNLP?tab=readme-ov-file
Dataset