Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Indictrans2: Towards high-quality and accessible machine translation models for all 22 scheduled indian languages
India has a rich linguistic landscape with languages from 4 major language families spoken
by over a billion people. 22 of these languages are listed in the Constitution of India …
by over a billion people. 22 of these languages are listed in the Constitution of India …
Naamapadam: A large-scale named entity annotated data for Indic languages
We present, Naamapadam, the largest publicly available Named Entity Recognition (NER)
dataset for the 11 major Indian languages from two language families. The dataset contains …
dataset for the 11 major Indian languages from two language families. The dataset contains …
A survey on nlp resources, tools, and techniques for marathi language processing
Natural Language Processing (NLP) has been in practice for the past couple of decades,
and extensive work has been done for the Western languages, particularly the English …
and extensive work has been done for the Western languages, particularly the English …
User-aware multilingual abusive content detection in social media
Despite growing efforts to halt distasteful content on social media, multilingualism has added
a new dimension to this problem. The scarcity of resources makes the challenge even …
a new dimension to this problem. The scarcity of resources makes the challenge even …
IndicLLMSuite: a blueprint for creating pre-training and fine-tuning datasets for indian languages
MSUR Khan, P Mehta, A Sankar… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the considerable advancements in English LLMs, the progress in building
comparable models for other languages has been hindered due to the scarcity of tailored …
comparable models for other languages has been hindered due to the scarcity of tailored …
Towards building text-to-speech systems for the next billion users
Deep learning based text-to-speech (TTS) systems have been evolving rapidly with
advances in model architectures, training methodologies, and generalization across …
advances in model architectures, training methodologies, and generalization across …
Bhasha-Abhijnaanam: Native-script and romanized language identification for 22 Indic languages
We create publicly available language identification (LID) datasets and models in all 22
Indian languages listed in the Indian constitution in both native-script and romanized text …
Indian languages listed in the Indian constitution in both native-script and romanized text …
Romanization-based large-scale adaptation of multilingual language models
Large multilingual pretrained language models (mPLMs) have become the de facto state of
the art for cross-lingual transfer in NLP. However, their large-scale deployment to many …
the art for cross-lingual transfer in NLP. However, their large-scale deployment to many …
Context-aware transliteration of romanized south asian languages
While most transliteration research is focused on single tokens such as named entities—for
example, transliteration of from the Gujarati script to the Latin script “Ahmedabad” …
example, transliteration of from the Gujarati script to the Latin script “Ahmedabad” …
Improving pretraining techniques for code-switched NLP
Pretrained models are a mainstay in modern NLP applications. Pretraining requires access
to large volumes of unlabeled text. While monolingual text is readily available for many of …
to large volumes of unlabeled text. While monolingual text is readily available for many of …