Abstract
This paper introduces the Corpus of Arabic Competitive Debates, Munazarat. Despite the significance of competitive debating in fostering critical thinking and promoting dialogue, researchers in the fields of Arabic Natural Language Processing (NLP), linguistics, argumentation studies, and education have limited access to datasets on competitive debating. At this stage of the study, we introduce Munazarat 1.0, which combines transcribed recordings of approximately 50 hours from 73 debates at QatarDebate-recognized tournaments, all available on YouTube. Munazarat is a novel specialized Arabic speech corpus, predominantly in Modern Standard Arabic (MSA), covering diverse debating topics and accompanied by metadata for each debate. The transcription of debates was performed using Fenek, a speech-to-text Kanari AI tool, and reviewed by three native Arabic speakers to enhance quality. The Munazarat 1.0 dataset can serve as a valuable resource for training Arabic NLP tools, developing argumentation mining machines, and analyzing Arabic argumentation and rhetoric styles.
| Original language | English |
|---|---|
| Title of host publication | 6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation at LREC-COLING 2024 - Workshop Proceedings |
| Editors | Hend Al-Khalifa, Kareem Darwish, Hamdy Mubarak, Mona Ali, Tamer Elsayed |
| Publisher | ELRA and ICCL |
| Pages | 20-30 |
| Number of pages | 11 |
| ISBN (Electronic) | 9782493814364 |
| ISBN (Print) | 9782493814364 |
| Publication status | Published - 2024 |
| Event | LREC-COLING 2024 - Lingotto Conference Centre, Turin, Italy Duration: 20 May 2024 → 25 May 2024 https://lrec-coling-2024.org |
Publication series
| Name | 6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation at LREC-COLING 2024 - Workshop Proceedings |
|---|
Conference
| Conference | LREC-COLING 2024 |
|---|---|
| Country/Territory | Italy |
| City | Turin |
| Period | 20/05/24 → 25/05/24 |
| Internet address |
Bibliographical note
Funding Information:This work was made possible by two QD Fellowship awards [QDRF-2022-01-003] and [QDRF-2022-01-005] from QatarDebate Center. We would like also to thank the group of native Arab students who contributed to this project by carrying on the task of reviewing text and human validation: Besher Al-Sioufy, Hilmi AbuAlyyan, Moaz Jemmieh, Abdullah Al-Shaar, Jenen Al-Hanai, and Abdullah Al-Kubaisi.
Publisher Copyright:
© 2024 ELRA Language Resource Association.