Research

The MARSAD Lab works at the intersection of AI, language, and society in the MENA region. Each research area connects to the specific publications and datasets that produced it. All dataset details are centralized on the Resources page to avoid duplication.

Arabic Language Resources and Corpora

Building open Arabic corpora has been a core activity of the lab since 2015. Resources cover Modern Standard Arabic, dialectal varieties, learner Arabic, diacritized text, and specialized domains.

QCAW · Qatari Corpus of Argumentative Writing

Annotated L1 Arabic / L2 English bilingual writer corpus, published with LDC (LDC2022T04).

Related publications
Related resources
ARAP-Tweet · Arabic Author Profiling Corpus

Large multi-dialect Twitter corpus for gender, age, and language variety identification.

Related publications
Related resources

Social Media Analytics and Computational Social Science

Studying how Arab communities use social media across political events, crises, cultural debates, and everyday discourse.

MARSAD AI · Arabic Social Media Observatory

Live social media observatory for Arabic content, funded by QRDI under the Digital Citizenship cluster.

Related resources
ClimateChat-300K · Multimodal Arabic climate corpus

300K-post multimodal Facebook dataset for climate communication across Arab communities.

Related resources

Hate Speech, Offensive Language, and Content Integrity

Detection and analysis of hate speech and offensive content in Arabic with attention to dialectal variation, cultural context, and annotator wellbeing.

ADHAR · Multi-Dialectal Arabic Hate Speech Corpus

Multi-dialectal hate speech corpus published in Frontiers in Artificial Intelligence.

Related publications
Related resources
MARASTA · Multi-Dialectal Cross-Domain Stance Corpus

Multi-dialectal Arabic cross-domain stance corpus, LREC-COLING 2024.

Related publications
Related resources
Emotional Toll of Hate Speech Annotation

Research on the mental and physical health effects of annotating hate speech content.

Related publications

Fact-Checking, Propaganda, and Information Disorder

Work on detecting misinformation, propaganda, check-worthy claims, and persuasion techniques in Arabic and multilingual content.

CLEF CheckThat! Lab (2018-2025)

Long-running international evaluation lab on check-worthiness, subjectivity, factuality, and political bias detection.

Related publications
ArAIEval · Propaganda Techniques in Arabic

Shared task on propaganda techniques and disinformation detection in Arabic text.

Related publications

LLM Safety and Evaluation

LLM safety evaluation for under-resourced languages, cultural adaptation, cognitive alignment, and responsible multilingual LLM development.

MAHED · Multimodal Hope and Hate Detection in Arabic

Shared task on multimodal detection of hope and hate emotions in Arabic content, ArabicNLP 2025.

Related publications

Digital Humanities and Cultural Heritage

Projects on digital humanities, Arabic heritage, and religious text processing with partners across Qatar and the MENA region.

Quranic NLP Systematic Review

Arabic natural language processing for Qur'anic research: systematic review, Artificial Intelligence Review 2023.

Related publications
Qur'an QA Shared Task (Best Paper)

DTW at Qur'an QA 2022: transfer learning with transformers for QA in a low-resource domain. Best paper award.

Related publications

Health, Mental Health, and Wellbeing

Applied NLP and social media analysis addressing mental health, wellbeing, depression detection, and health communication in Arabic populations.

EmoHopeSpeech · Bilingual Emotion and Hope Corpus

Annotated dataset of emotions and hope speech in English and Arabic, RANLP 2025.

Related publications
Related resources