Vol. 2024 (2024)
Articles

A Brief Review on Preprocessing Text in Arabic Language Dataset: Techniques and Challenges

Ahmed Adil Nafea
Department of Artificial Intelligence, College of Computer Science and IT, University of Anbar, Ramadi, Iraq
Muhmmad Shihab Muayad
Department of Computer Networking Systems, College of Computer and Information Technology University of Anbar, Anbar, Iraq
Russel R Majeed
College of Education for Pure Sciences, University of Thi-Qar, Thi-Qar, Iraq
Ashour Ali
Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia
Omar M. Bashaddadh
Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia
Meaad Ali Khalaf
Department of computer science, AUL University, Beirut, Lebanon
Abu Baker Nahid Sami
Department of Computer Science, University of Anbar Ramadi, Iraq
Amani Steiti
Department of Computer Systems And Networks, Faculty of Information Engineering, University Tishreen, Latakia, Syria.

Published 2024-05-18

Keywords

  • Arabic language,
  • Artificial Intelligence ,
  • Preprocessing,
  • Natural Language Processing (NLP),
  • Deep Learning,
  • Machine Learning
  • ...More
    Less

How to Cite

Nafea, A. A., Muayad, M. S., Majeed , R. R., Ali , A., Bashaddadh, O. M., Khalaf , M. A., Sami , A. B. N., & Steiti, A. (2024). A Brief Review on Preprocessing Text in Arabic Language Dataset: Techniques and Challenges. Babylonian Journal of Artificial Intelligence, 2024, 46–53. https://doi.org/10.58496/BJAI/2024/007

Abstract

Text preprocessing plays an important role in natural language processing (NLP) tasks containing text classification, sentiment analysis, and machine translation. The preprocessing of Arabic text still presents unique challenges due to the language's rich morphology, complex grammar, and various character sets. This brief review studied various techniques utilized for preprocessing Arabic text data. This study discusses the challenges specific to Arabic text and current an overview of key preprocessing steps including normalization, tokenization, stemming, stop-word removal, and noise reduction. This survey analyzes preprocessing techniques on NLP tasks and focus on current research trends and future directions in Arabic text preprocessing.

Downloads

Download data is not yet available.