Parsbert: Transformer-based model for persian language understanding
The surge of pre-trained language models has begun a new era in the field of Natural
Language Processing (NLP) by allowing us to build powerful language models. Among …
Language Processing (NLP) by allowing us to build powerful language models. Among …
[HTML][HTML] Investigating the Challenges and Opportunities in Persian Language Information Retrieval through Standardized Data Collections and Deep Learning
The Persian language, also known as Farsi, is distinguished by its intricate morphological
richness, yet it contends with a paucity of linguistic resources. With an estimated 110 million …
richness, yet it contends with a paucity of linguistic resources. With an estimated 110 million …
No data to crawl? monolingual corpus creation from PDF files of truly low-resource languages in Peru
G Bustamante, A Oncevay… - Proceedings of the Twelfth …, 2020 - aclanthology.org
We introduce new monolingual corpora for four indigenous and endangered languages
from Peru: Shipibo-konibo, Ashaninka, Yanesha and Yine. Given the total absence of these …
from Peru: Shipibo-konibo, Ashaninka, Yanesha and Yine. Given the total absence of these …
Optimizing annotation effort using active learning strategies: A sentiment analysis case study in persian
Deep learning models are the current State-of-the-art methodologies towards many real-
world problems. However, they need a substantial amount of labeled data to be trained …
world problems. However, they need a substantial amount of labeled data to be trained …
GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion
Abstract Text-to-Speech (TTS) systems have made significant strides, enabling the
generation of speech from grapheme sequences. However, for low-resource languages …
generation of speech from grapheme sequences. However, for low-resource languages …
Producing an instagram dataset for persian language sentiment analysis using crowdsourcing method
with the rapid growth of using the internet and social media, people can easily share their
opinions on these platforms. due to this fact, user's comments are considering as a rich …
opinions on these platforms. due to this fact, user's comments are considering as a rich …
[PDF][PDF] Idpl-pfod: an image dataset of printed Farsi text for OCR research
The existence of appropriate image datasets in the field of optical character recognition
(OCR) plays an essential role in the accuracy of OCR systems. Despite the fact that many …
(OCR) plays an essential role in the accuracy of OCR systems. Despite the fact that many …
Matina: A Large-Scale 73B Token Persian Text Corpus
SB Hosseinbeigi, F Taherinezhad, H Faili… - arxiv preprint arxiv …, 2025 - arxiv.org
Text corpora are essential for training models used in tasks like summarization, translation,
and large language models (LLMs). While various efforts have been made to collect …
and large language models (LLMs). While various efforts have been made to collect …
HmBlogs: A big general Persian corpus
This paper introduces the hmBlogs corpus for Persian, as a low resource language. This
corpus has been prepared based on a collection of nearly 20 million blog posts over a …
corpus has been prepared based on a collection of nearly 20 million blog posts over a …
FarsBase-KBP: A knowledge base population system for the Persian Knowledge Graph
M Asgari-Bidhendi, B Janfada… - Journal of Web Semantics, 2021 - Elsevier
While most of the knowledge bases already support the English language, there is only one
knowledge base for the Persian language, known as FarsBase, which is automatically …
knowledge base for the Persian language, known as FarsBase, which is automatically …