Resources
Browse and download the lab's datasets, shared task resources, lexicons, and tools. Most corpora are
released for academic use on request.
Access. Public
resources link directly to their catalog entries. Others are shared on request via the
resource access form .
59 of 59
resources
Corpora 35
entries
Annotated Arabic text and multimodal corpora covering
social media, news, dialectal materials, religious text, and specialized domains.
Audience Engagement with Arabic Women’s
Social Empowerment and Wellbeing: A Decadal Corpus
Arabic
On request women
ArabDiscrim: A Decade-Long Arabic Facebook
Corpus on Racism and Discrimination
Arabic
On request racism facebook
JobArabi: An Arabic Corpus and Analysis of
Job Announcements from Social Media
Arabic
On request labor
ClimateChat-300K: A Multi-Modal Facebook
Dataset for Understanding Diverse Perspectives in Climate Communication
Arabic
On request climate facebook
Longitudinal Trends in Global Climate Change
Discourse on Facebook
Arabic
On request climate facebook
Constructing Gen Z as a Political
Generation: A Yearlong Multi-Platform Analysis of Digital Political Framing Across
Global Contexts
Arabic
On request
Learning to Engage: Modeling Topic-Sensitive
Reactions in Arabic Women’s Online Discourse
Arabic
On request women
D032 KZ-SafetyPrompts 2026
KZ-SafetyPrompts: A Kazakh Safety Evaluation
Prompt Dataset for Large Language Models
Kazakh
On request llm-safety
D033 ChiSafe-PAS 2026
Beyond English and Evasion: A
Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in
Chinese
Chinese On
request llm-safety
D034 AlbanianLLMSafety 2026
AlbanianLLMSafety: A Safety Evaluation
Dataset for Large Language Models in Albanian
Albanian On
request llm-safety
D035 StanceNakba 2026
StanceNakba Shared Task: Actor and
Topic-Aware Stance Detection in Public Discourse
Arabic
On request stance
From Posts to Pressure: An Arabic Facebook
Dataset about Stress and Mental-Health Monitoring
Arabic
On request mental-health facebook
Creating a Multilingual Dataset in Arabic
and Croatian from Sports Videos Through a Data Processing Pipeline Combining ASR and
MT
Arabic Croatian English On
request
An Annotated Corpus of Arabic Tweets for
Hate Speech Analysis
Arabic
On request hate-speech twitter
EmoHopeSpeech: An Annotated Dataset of
Emotions and Hope Speech in English and Arabic
Arabic English On
request hope-speech emotion
Analyzing Digital Polarization on Hijab: A
Dataset of Annotated YouTube Comments
Arabic
On request youtube
MAHED Shared Task: Multimodal Detection of
Hope and Hate Emotions in Arabic Content
Arabic
On request hate-speech hope-speech emotion
Multi-Dimensional Insights: Annotated
Dataset of Stance, Sentiment, and Emotion in Facebook Comments on Tunisia's July
25 Measures
Arabic
On request stance emotion facebook
MARASTA: A Multi-dialectal Arabic
Cross-domain Stance Corpus
Arabic
On request stance dialectal
Munazarat 1.0: A Corpus of Arabic
Competitive Debates
Arabic
On request argumentation
So Hateful! Building a Multi-Label Hate
Speech Annotated Arabic Dataset
Arabic
On request hate-speech
QCAW 1.0: Building a Qatari Corpus of
Student Argumentative Writing
Arabic
On request argumentation
Hate speech detection with ADHAR: a
multi-dialectal hate speech corpus in Arabic
Arabic
On request hate-speech dialectal
Analyzing Conflict Through Data: A Dataset
on the Digital Framing of Sheikh Jarrah Evictions
Arabic
On request
ThatiAR: Subjectivity Detection in Arabic
News Sentences
Arabic
On request
The FIGNEWS Shared Task on News Media
Narratives
Arabic
On request
Sentiment Analysis and Emotion Annotation of
a Large-Scale Arabic YouTube Trauma Corpus
Arabic
On request mental-health emotion youtube
Potentials of ChatGPT for Annotating Vaccine
Related Tweets
Arabic
On request twitter
ArAIEval Shared Task: Persuasion Techniques
and Disinformation Detection in Arabic Text
Arabic
On request
Overview of the WANLP 2022 Shared Task on
Propaganda Detection in Arabic
Arabic
On request propaganda
ArSarcasm-v2: An Updated Corpus for Sarcasm
Detection in Arabic Tweets
Arabic
On request twitter
DAICT: A Dialectal Arabic Irony Corpus
Extracted from Twitter
Arabic
On request irony twitter dialectal
Building a Corpus of Qatari Arabic
Expressions
Arabic
On request
A Fine-Grained Annotated Multi-Dialectal
Arabic Corpus
Arabic
On request dialectal
A Pilot PropBank Annotation for Quranic
Arabic
Arabic
On request islamic syntax-semantics
Lexicons 1
entries
Specialized word lists covering morphology, MSA
vocabulary, dialect terms, and social media categories.
ARLEX- A Large Scale Comprehensive Lexical
Inventory for Modern Standard Arabic
Arabic
On request
Tools 1
entries
Software, annotation systems, and analytical platforms.
T001 MARSAD AI Platform 2026
Live Arabic social media observatory
platform with topic modeling, sentiment, toxicity, network, and geographic analysis.
QRDI-funded under the Digital Citizenship cluster.
Arabic
Public web-platform qrdi social-media
Guidelines 2
entries
Published annotation protocols and methodological
frameworks.
Guidelines and Annotation Framework for
Arabic Author Profiling
Arabic
On request
Toward an Arabic Punctuated Corpus:
Annotation Guidelines and Evaluation
Arabic
On request
Corpora 18
entries
Annotated Arabic text and multimodal corpora covering
social media, news, dialectal materials, religious text, and specialized domains.
Overview of the CLEF-2025 CheckThat! Lab
Task 1 on Subjectivity in News Articles
Arabic
On request fact-checking
ImageEval 2025: The First Arabic Image
Captioning Shared Task
Arabic
On request
QIAS2025 Shared Task on Islamic Inheritance
Reasoning and Knowledge Assessment
Arabic
On request islamic
Overview of the CLEF-2024 CheckThat! Lab
Task 2 on Subjectivity in News Articles
Arabic
On request fact-checking
Overview of the CLEF–2023 CheckThat! Lab on
Checkworthiness, Subjectivity, Political Bias, Factuality, and Authority of News
Articles and Their Source
Arabic
On request fact-checking
Overview of the CLEF-2022 CheckThat! Lab
Task 1 on Identifying Relevant Claims in Tweets
Arabic
On request fact-checking twitter
Overview of the CLEF-2022 CheckThat! Lab
Task 2 on Detecting Previously Fact-Checked Claims
Arabic
On request fact-checking
Fighting the COVID-19 Infodemic: Modeling
the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy
Makers, and Society
Arabic
On request fact-checking
Findings of the NLP4IF-2021 Shared Tasks on
Fighting the COVID-19 Infodemic and Censorship Detection
Arabic
On request
On the Author Profiling and Deception
Detection in Arabic shared task at FIRE
Arabic
On request
Overview of the Track on Author Profiling
and Deception Detection in Arabic?
Arabic
On request
1- Overview of the CLEF-2018 CheckThat! Lab
on Automatic Identification and Verification of Political Claims. Task 1:
Check-Worthiness
Arabic
On request fact-checking
Overview of the CLEF-2018 CheckThat! Lab on
Automatic Identification and Verification of Political Claims. Task 2: Factuality?
Arabic
On request fact-checking
The Second QALB Shared Task on Automatic
Text Correction for Arabic
Arabic
On request error-correction
The First QALB Shared Task on Automatic Text
Correction for Arabic
Arabic
On request error-correction
A Pilot Arabic Propbank
Arabic
On request syntax-semantics
Evaluation of multilingual text alignment
systems: the ARCADE II project
Multilingual On
request
Lexicons 1
entries
Specialized word lists covering morphology, MSA
vocabulary, dialect terms, and social media categories.
The MADAR Arabic Dialect Corpus and Lexicon
Arabic
On request dialectal
Guidelines 1
entries
Published annotation protocols and methodological
frameworks.
Building an Arabic Machine Translation
Post-Edited Corpus: Guidelines and Annotation
Arabic
On request