Parsbert: Transformer-based model for persian language understanding

M Farahani, M Gharachorloo, M Farahani… - Neural Processing …, 2021 - Springer
The surge of pre-trained language models has begun a new era in the field of Natural
Language Processing (NLP) by allowing us to build powerful language models. Among …

[HTML][HTML] Investigating the Challenges and Opportunities in Persian Language Information Retrieval through Standardized Data Collections and Deep Learning

S Moniri, T Schlosser, D Kowerko - Computers, 2024 - mdpi.com
The Persian language, also known as Farsi, is distinguished by its intricate morphological
richness, yet it contends with a paucity of linguistic resources. With an estimated 110 million …

No data to crawl? monolingual corpus creation from PDF files of truly low-resource languages in Peru

G Bustamante, A Oncevay… - Proceedings of the Twelfth …, 2020 - aclanthology.org
We introduce new monolingual corpora for four indigenous and endangered languages
from Peru: Shipibo-konibo, Ashaninka, Yanesha and Yine. Given the total absence of these …

Optimizing annotation effort using active learning strategies: A sentiment analysis case study in persian

SAA Asli, B Sabeti, Z Majdabadi… - Proceedings of the …, 2020 - aclanthology.org
Deep learning models are the current State-of-the-art methodologies towards many real-
world problems. However, they need a substantial amount of labeled data to be trained …

GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion

E Rahmati, H Sameti - Findings of the Association for …, 2024 - aclanthology.org
Abstract Text-to-Speech (TTS) systems have made significant strides, enabling the
generation of speech from grapheme sequences. However, for low-resource languages …

Producing an instagram dataset for persian language sentiment analysis using crowdsourcing method

M Heidari, P Shamsinejad - 2020 6th International Conference …, 2020 - ieeexplore.ieee.org
with the rapid growth of using the internet and social media, people can easily share their
opinions on these platforms. due to this fact, user's comments are considering as a rich …

[PDF][PDF] Idpl-pfod: an image dataset of printed Farsi text for OCR research

F sadat Hosseini, S Kashef, E Shabaninia… - Proceedings of the …, 2021 - aclanthology.org
The existence of appropriate image datasets in the field of optical character recognition
(OCR) plays an essential role in the accuracy of OCR systems. Despite the fact that many …

Matina: A Large-Scale 73B Token Persian Text Corpus

SB Hosseinbeigi, F Taherinezhad, H Faili… - arxiv preprint arxiv …, 2025 - arxiv.org
Text corpora are essential for training models used in tasks like summarization, translation,
and large language models (LLMs). While various efforts have been made to collect …

HmBlogs: A big general Persian corpus

HM Khansari, M Shamsfard - arxiv preprint arxiv:2111.02362, 2021 - arxiv.org
This paper introduces the hmBlogs corpus for Persian, as a low resource language. This
corpus has been prepared based on a collection of nearly 20 million blog posts over a …

FarsBase-KBP: A knowledge base population system for the Persian Knowledge Graph

M Asgari-Bidhendi, B Janfada… - Journal of Web Semantics, 2021 - Elsevier
While most of the knowledge bases already support the English language, there is only one
knowledge base for the Persian language, known as FarsBase, which is automatically …