Študovňa Google

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Indictrans2: Towards high-quality and accessible machine translation models for all 22 scheduled indian languages

J Gala, PA Chitale, R AK, V Gumma… - arxiv preprint arxiv …, 2023 - arxiv.org

India has a rich linguistic landscape with languages from 4 major language families spoken
by over a billion people. 22 of these languages are listed in the Constitution of India …

Uložiť Citovať Citované 63-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naamapadam: A large-scale named entity annotated data for Indic languages

A Mhaske, H Kedia, S Doddapaneni… - arxiv preprint arxiv …, 2022 - arxiv.org

We present, Naamapadam, the largest publicly available Named Entity Recognition (NER)
dataset for the 11 major Indian languages from two language families. The dataset contains …

Uložiť Citovať Citované 22-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

A survey on nlp resources, tools, and techniques for marathi language processing

P Lahoti, N Mittal, G Singh - ACM Transactions on Asian and Low …, 2022 - dl.acm.org

Natural Language Processing (NLP) has been in practice for the past couple of decades,
and extensive work has been done for the Western languages, particularly the English …

Uložiť Citovať Citované 26-krát Súvisiace články Všetky verzie 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

User-aware multilingual abusive content detection in social media

MZU Rehman, S Mehta, K Singh, K Kaushik… - Information Processing & …, 2023 - Elsevier

Despite growing efforts to halt distasteful content on social media, multilingualism has added
a new dimension to this problem. The scarcity of resources makes the challenge even …

Uložiť Citovať Citované 12-krát Súvisiace články Všetky verzie 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

IndicLLMSuite: a blueprint for creating pre-training and fine-tuning datasets for indian languages

MSUR Khan, P Mehta, A Sankar… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite the considerable advancements in English LLMs, the progress in building
comparable models for other languages has been hindered due to the scarcity of tailored …

Uložiť Citovať Citované 8-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards building text-to-speech systems for the next billion users

GK Kumar, SV Praveen, P Kumar… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Deep learning based text-to-speech (TTS) systems have been evolving rapidly with
advances in model architectures, training methodologies, and generalization across …

Uložiť Citovať Citované 17-krát Súvisiace články Všetky verzie 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bhasha-Abhijnaanam: Native-script and romanized language identification for 22 Indic languages

Y Madhani, MM Khapra, A Kunchukuttan - arxiv preprint arxiv:2305.15814, 2023 - arxiv.org

We create publicly available language identification (LID) datasets and models in all 22
Indian languages listed in the Indian constitution in both native-script and romanized text …

Uložiť Citovať Citované 12-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Romanization-based large-scale adaptation of multilingual language models

S Purkayastha, S Ruder, J Pfeiffer, I Gurevych… - arxiv preprint arxiv …, 2023 - arxiv.org

Large multilingual pretrained language models (mPLMs) have become the de facto state of
the art for cross-lingual transfer in NLP. However, their large-scale deployment to many …

Uložiť Citovať Citované 9-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Context-aware transliteration of romanized south asian languages

C Kirov, C Johny, A Katanova, A Gutkin… - Computational …, 2024 - direct.mit.edu

While most transliteration research is focused on single tokens such as named entities—for
example, transliteration of from the Gujarati script to the Latin script “Ahmedabad” …

Uložiť Citovať Citované 5-krát Súvisiace články Všetky verzie 6

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Improving pretraining techniques for code-switched NLP

R Das, S Ranjan, S Pathak, P Jyothi - Proceedings of the 61st …, 2023 - aclanthology.org

Pretrained models are a mainstay in modern NLP applications. Pretraining requires access
to large volumes of unlabeled text. While monolingual text is readily available for many of …

Uložiť Citovať Citované 5-krát Súvisiace články Všetky verzie 3 HTML verzia

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Indictrans2: Towards high-quality and accessible machine translation models for all 22 scheduled indian languages

Naamapadam: A large-scale named entity annotated data for Indic languages

A survey on nlp resources, tools, and techniques for marathi language processing

User-aware multilingual abusive content detection in social media

IndicLLMSuite: a blueprint for creating pre-training and fine-tuning datasets for indian languages

Towards building text-to-speech systems for the next billion users

Bhasha-Abhijnaanam: Native-script and romanized language identification for 22 Indic languages

Romanization-based large-scale adaptation of multilingual language models

Context-aware transliteration of romanized south asian languages

Improving pretraining techniques for code-switched NLP