Bloom: A 176b-parameter open-access multilingual language model
Large language models (LLMs) have been shown to be able to perform new tasks based on
a few demonstrations or natural language instructions. While these capabilities have led to …
a few demonstrations or natural language instructions. While these capabilities have led to …
The bigscience roots corpus: A 1.6 tb composite multilingual dataset
As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …
A comprehensive survey on various fully automatic machine translation evaluation metrics
The fast advancement in machine translation models necessitates the development of
accurate evaluation metrics that would allow researchers to track the progress in text …
accurate evaluation metrics that would allow researchers to track the progress in text …
[HTML][HTML] Automatic Speech Recognition: A survey of deep learning techniques and approaches
H Ahlawat, N Aggarwal, D Gupta - International Journal of Cognitive …, 2025 - Elsevier
Significant research has been conducted during the last decade on the application of
machine learning for speech processing, particularly speech recognition. However, in recent …
machine learning for speech processing, particularly speech recognition. However, in recent …
Re-contextualizing fairness in NLP: The case of India
Recent research has revealed undesirable biases in NLP data and models. However, these
efforts focus on social disparities in West, and are not directly portable to other geo-cultural …
efforts focus on social disparities in West, and are not directly portable to other geo-cultural …
inltk: Natural language toolkit for indic languages
G Arora - arxiv preprint arxiv:2009.12534, 2020 - arxiv.org
We present iNLTK, an open-source NLP library consisting of pre-trained language models
and out-of-the-box support for Data Augmentation, Textual Similarity, Sentence …
and out-of-the-box support for Data Augmentation, Textual Similarity, Sentence …
Sentiment analysis using XLM-R transformer and zero-shot transfer learning on resource-poor indian language
Sentiment analysis on social media relies on comprehending the natural language and
using a robust machine learning technique that learns multiple layers of representations or …
using a robust machine learning technique that learns multiple layers of representations or …
Hottest: Hate and offensive content identification in Tamil using transformers and enhanced stemming
Offensive content or hate speech is defined as any form of communication that aims to
annoy, harass, disturb, or anger an individual or community based on factors such as faith …
annoy, harass, disturb, or anger an individual or community based on factors such as faith …
Fighting hate speech from bilingual hinglish speaker's perspective, a transformer-and translation-based approach.
Many people have begun to use social media platforms due to the increased use of the
Internet over the previous decade. It has a lot of benefits, but it also comes with a lot of risks …
Internet over the previous decade. It has a lot of benefits, but it also comes with a lot of risks …
Indic-transformers: An analysis of transformer language models for Indian languages
Language models based on the Transformer architecture have achieved state-of-the-art
performance on a wide range of NLP tasks such as text classification, question-answering …
performance on a wide range of NLP tasks such as text classification, question-answering …