Munazarat 1.0: A Corpus of Arabic Competitive Debates

M.M. Khader, A.G. Al-Sharafi, M.H. Al-Sioufy, W. Zaghouani, A. Al-Zawqari

Research output: Chapter in Book/Report/Conference proceedingConference paper

5 Citations (Scopus)

Abstract

This paper introduces the Corpus of Arabic Competitive Debates, Munazarat. Despite the significance of competitive debating in fostering critical thinking and promoting dialogue, researchers in the fields of Arabic Natural Language Processing (NLP), linguistics, argumentation studies, and education have limited access to datasets on competitive debating. At this stage of the study, we introduce Munazarat 1.0, which combines transcribed recordings of approximately 50 hours from 73 debates at QatarDebate-recognized tournaments, all available on YouTube. Munazarat is a novel specialized Arabic speech corpus, predominantly in Modern Standard Arabic (MSA), covering diverse debating topics and accompanied by metadata for each debate. The transcription of debates was performed using Fenek, a speech-to-text Kanari AI tool, and reviewed by three native Arabic speakers to enhance quality. The Munazarat 1.0 dataset can serve as a valuable resource for training Arabic NLP tools, developing argumentation mining machines, and analyzing Arabic argumentation and rhetoric styles.

Original languageEnglish
Title of host publication6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation at LREC-COLING 2024 - Workshop Proceedings
EditorsHend Al-Khalifa, Kareem Darwish, Hamdy Mubarak, Mona Ali, Tamer Elsayed
PublisherELRA and ICCL
Pages20-30
Number of pages11
ISBN (Electronic)9782493814364
ISBN (Print)9782493814364
Publication statusPublished - 2024
EventLREC-COLING 2024 - Lingotto Conference Centre, Turin, Italy
Duration: 20 May 202425 May 2024
https://lrec-coling-2024.org

Publication series

Name6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation at LREC-COLING 2024 - Workshop Proceedings

Conference

ConferenceLREC-COLING 2024
Country/TerritoryItaly
CityTurin
Period20/05/2425/05/24
Internet address

Bibliographical note

Funding Information:
This work was made possible by two QD Fellowship awards [QDRF-2022-01-003] and [QDRF-2022-01-005] from QatarDebate Center. We would like also to thank the group of native Arab students who contributed to this project by carrying on the task of reviewing text and human validation: Besher Al-Sioufy, Hilmi AbuAlyyan, Moaz Jemmieh, Abdullah Al-Shaar, Jenen Al-Hanai, and Abdullah Al-Kubaisi.

Publisher Copyright:
© 2024 ELRA Language Resource Association.

Fingerprint

Dive into the research topics of 'Munazarat 1.0: A Corpus of Arabic Competitive Debates'. Together they form a unique fingerprint.

Cite this