The MARSAD Lab works at the intersection of AI, language, and society in the MENA region. Each research area connects to the specific publications and datasets that produced it. All dataset details are centralized on the Resources page to avoid duplication.
Building open Arabic corpora has been a core activity of the lab since 2015. Resources cover Modern Standard Arabic, dialectal varieties, learner Arabic, diacritized text, and specialized domains.
Annotated L1 Arabic / L2 English bilingual writer corpus, published with LDC (LDC2022T04).
Large multi-dialect Twitter corpus for gender, age, and language variety identification.
Detection and analysis of hate speech and offensive content in Arabic with attention to dialectal variation, cultural context, and annotator wellbeing.
Multi-dialectal hate speech corpus published in Frontiers in Artificial Intelligence.
Multi-dialectal Arabic cross-domain stance corpus, LREC-COLING 2024.
Research on the mental and physical health effects of annotating hate speech content.
Work on detecting misinformation, propaganda, check-worthy claims, and persuasion techniques in Arabic and multilingual content.
Long-running international evaluation lab on check-worthiness, subjectivity, factuality, and political bias detection.
Shared task on propaganda techniques and disinformation detection in Arabic text.
LLM safety evaluation for under-resourced languages, cultural adaptation, cognitive alignment, and responsible multilingual LLM development.
Shared task on multimodal detection of hope and hate emotions in Arabic content, ArabicNLP 2025.
Projects on digital humanities, Arabic heritage, and religious text processing with partners across Qatar and the MENA region.
Arabic natural language processing for Qur'anic research: systematic review, Artificial Intelligence Review 2023.
DTW at Qur'an QA 2022: transfer learning with transformers for QA in a low-resource domain. Best paper award.
Applied NLP and social media analysis addressing mental health, wellbeing, depression detection, and health communication in Arabic populations.
Annotated dataset of emotions and hope speech in English and Arabic, RANLP 2025.
Social Media Analytics and Computational Social Science
Studying how Arab communities use social media across political events, crises, cultural debates, and everyday discourse.
MARSAD AI · Arabic Social Media Observatory
Live social media observatory for Arabic content, funded by QRDI under the Digital Citizenship cluster.
ClimateChat-300K · Multimodal Arabic climate corpus
300K-post multimodal Facebook dataset for climate communication across Arab communities.