Aya model: An instruction finetuned open-access multilingual language model
Recent breakthroughs in large language models (LLMs) have centered around a handful of
data-rich languages. What does it take to broaden access to breakthroughs beyond first …
data-rich languages. What does it take to broaden access to breakthroughs beyond first …
Natural language understanding of devanagari script languages: Language identification, hate speech and its target detection
The growing use of Devanagari-script languages such as Hindi, Nepali, Marathi, Sanskrit,
and Bhojpuri on social media presents unique challenges for natural language …
and Bhojpuri on social media presents unique challenges for natural language …
Aya dataset: An open-access collection for multilingual instruction tuning
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many
recent achievements in the space of natural language processing (NLP) can be attributed to …
recent achievements in the space of natural language processing (NLP) can be attributed to …
Mc2: Towards transparent and culturally-aware nlp for minority languages in china
Current large language models demonstrate deficiencies in understanding low-resource
languages, particularly the minority languages in China. This limitation stems from the …
languages, particularly the minority languages in China. This limitation stems from the …
Bhasa: A holistic southeast asian linguistic and cultural evaluation suite for large language models
WQ Leong, JG Ngui, Y Susanto, H Rengarajan… - arxiv preprint arxiv …, 2023 - arxiv.org
The rapid development of Large Language Models (LLMs) and the emergence of novel
abilities with scale have necessitated the construction of holistic, diverse and challenging …
abilities with scale have necessitated the construction of holistic, diverse and challenging …
Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology
Existing research in measuring and mitigating gender bias predominantly centers on
English, overlooking the intricate challenges posed by non-English languages and the …
English, overlooking the intricate challenges posed by non-English languages and the …
Airavata: Introducing hindi instruction-tuned llm
We announce the initial release of" Airavata," an instruction-tuned LLM for Hindi. Airavata
was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make …
was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make …
OffensEval 2023: Offensive language identification in the age of Large Language Models
The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular,
attracting over 1300 participating teams. The two editions of the shared task helped advance …
attracting over 1300 participating teams. The two editions of the shared task helped advance …
Too late to train, too early to use? a study on necessity and viability of low-resource bengali llms
Each new generation of English-oriented Large Language Models (LLMs) exhibits
enhanced cross-lingual transfer capabilities and significantly outperforms older LLMs on low …
enhanced cross-lingual transfer capabilities and significantly outperforms older LLMs on low …
Vacaspati: A diverse corpus of bangla literature
Bangla (or Bengali) is the fifth most spoken language globally; yet, the state-of-the-art NLP in
Bangla is lagging for even simple tasks such as lemmatization, POS tagging, etc. This is …
Bangla is lagging for even simple tasks such as lemmatization, POS tagging, etc. This is …