Arabic Fake News Detection Across Generational Text Representations: From Traditional Models to Transformer-Based Methodologies
- 1 Department of Computer Science, Faculty of Information Technology, The World Islamic Sciences and Education University, Amman, Jordan
Abstract
The rapid proliferation of fake news on Arabic social media has amplified societal and political risks, yet research on automatic detection in Arabic remains limited due to scarce datasets, morphological complexity, and underexplored preprocessing strategies. This study presents a comprehensive benchmark for Arabic fake news detection, unifying seven Machine Learning (ML) algorithms, three Deep Learning (DL) models, and a transformer-based approach (AraBERT) under consistent experimental conditions. A hybrid balanced dataset of 4,838 tweets was constructed from ArCOV19-Rumors, AraCOVID19-MFH, and NLP4IF-2021. Three levels of preprocessing were systematically evaluated: Primitive cleaning and tokenization, named entity recognition (NER), and NER with stemming. The results show a clear change in representation: TF-IDF gives strong lexical baselines, AraVec gives moderate gains through static embeddings, AraBERT embeddings give big improvements through contextualization, and fine-tuned AraBERT gets the best results (Accuracy/F1 ≈ 0.95). A comparative analysis shows that SVM is the best ML algorithm, Bi-LSTM is the best DL model, and contextual embeddings have a huge effect on all families. Preprocessing strategies have different effects on different types of models. For example, stemming helps ML but hurts DL, while NER always helps both. This study provides solid baselines, methodological insights, and a generational perspective on Arabic text representations, establishing a foundation for future research aimed at combating misinformation in Arabic NLP.
DOI: https://doi.org/10.3844/jcssp.2026.1313.1329
Copyright: © 2026 Noor M. Alkudah. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 45 Views
- 7 Downloads
- 0 Citations
Download
Keywords
- Arabic Fake News
- Generational Benchmarking
- Hybrid Datasets
- NER
- Stemming
- AraBERT