[HTML][HTML] Arabic natural language processing: An overview

I Guellil, H Saâdane, F Azouaou, B Gueni… - Journal of King Saud …, 2021 - Elsevier
Arabic is recognised as the 4th most used language of the Internet. Arabic has three main
varieties:(1) classical Arabic (CA),(2) Modern Standard Arabic (MSA),(3) Arabic Dialect (AD) …

Automated fact‐checking: A survey

X Zeng, AS Abumansour… - Language and Linguistics …, 2021 - Wiley Online Library
As online false information continues to grow, automated fact‐checking has gained an
increasing amount of attention in recent years. Researchers in the field of Natural Language …

Having beer after prayer? measuring cultural bias in large language models

T Naous, MJ Ryan, A Ritter, W Xu - arxiv preprint arxiv:2305.14456, 2023 - arxiv.org
As the reach of large language models (LMs) expands globally, their ability to cater to
diverse cultural contexts becomes crucial. Despite advancements in multilingual …

Arabert: Transformer-based model for arabic language understanding

W Antoun, F Baly, H Hajj - arxiv preprint arxiv:2003.00104, 2020 - arxiv.org
The Arabic language is a morphologically rich language with relatively few resources and a
less explored syntax compared to English. Given these limitations, Arabic Natural Language …

Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

The interplay of variant, size, and task type in Arabic pre-trained language models

G Inoue, B Alhafni, N Baimukan, H Bouamor… - arxiv preprint arxiv …, 2021 - arxiv.org
In this paper, we explore the effects of language variants, data sizes, and fine-tuning task
types in Arabic pre-trained language models. To do so, we build three pre-trained language …

How good is your tokenizer? on the monolingual performance of multilingual language models

P Rust, J Pfeiffer, I Vulić, S Ruder… - arxiv preprint arxiv …, 2020 - arxiv.org
In this work, we provide a systematic and comprehensive empirical comparison of pretrained
multilingual language models versus their monolingual counterparts with regard to their …

CAMeL tools: An open source python toolkit for Arabic natural language processing

O Obeid, N Zalmout, S Khalifa, D Taji… - Proceedings of the …, 2020 - aclanthology.org
Abstract We present CAMeL Tools, a collection of open-source tools for Arabic natural
language processing in Python. CAMeL Tools currently provides utilities for pre-processing …

LAraBench: Benchmarking Arabic AI with large language models

A Abdelali, H Mubarak, SA Chowdhury… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advancements in Large Language Models (LLMs) have significantly influenced the
landscape of language and speech research. Despite this progress, these models lack …

Pre-training bert on arabic tweets: Practical considerations

A Abdelali, S Hassan, H Mubarak, K Darwish… - arxiv preprint arxiv …, 2021 - arxiv.org
Pretraining Bidirectional Encoder Representations from Transformers (BERT) for
downstream NLP tasks is a non-trival task. We pretrained 5 BERT models that differ in the …