Multilingual sentiment analysis for under-resourced languages: a systematic review of the landscape
KR Mabokela, T Celik, M Raborife - IEEE Access, 2022 - ieeexplore.ieee.org
Sentiment analysis automatically evaluates people's opinions of products or services. It is an
emerging research area with promising advancements in high-resource languages such as …
emerging research area with promising advancements in high-resource languages such as …
Demix layers: Disentangling domains for modular language modeling
We introduce a new domain expert mixture (DEMix) layer that enables conditioning a
language model (LM) on the domain of the input text. A DEMix layer is a collection of expert …
language model (LM) on the domain of the input text. A DEMix layer is a collection of expert …
Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …
can be analyzed and generated at many granularities. Until recently, most natural language …
Bloom+ 1: Adding language support to bloom for zero-shot prompting
The BLOOM model is a large publicly available multilingual language model, but its
pretraining was limited to 46 languages. To extend the benefits of BLOOM to other …
pretraining was limited to 46 languages. To extend the benefits of BLOOM to other …
AmericasNLI: Evaluating zero-shot natural language understanding of pretrained multilingual models in truly low-resource languages
Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot
setting, even for languages unseen during pretraining. However, prior work evaluating …
setting, even for languages unseen during pretraining. However, prior work evaluating …
Expanding pretrained models to thousands more languages via lexicon-based adaptation
The performance of multilingual pretrained models is highly dependent on the availability of
monolingual or parallel text present in a target language. Thus, the majority of the world's …
monolingual or parallel text present in a target language. Thus, the majority of the world's …
Indobertweet: A pretrained language model for indonesian twitter with effective domain-specific vocabulary initialization
We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that
is trained by extending a monolingually-trained Indonesian BERT model with additive …
is trained by extending a monolingually-trained Indonesian BERT model with additive …
How to adapt your pretrained multilingual model to 1600 languages
Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer,
performing best for languages seen during pretraining. While methods exist to improve …
performing best for languages seen during pretraining. While methods exist to improve …
When being unseen from mBERT is just the beginning: Handling new languages with multilingual language models
Transfer learning based on pretraining language models on a large amount of raw data has
become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear …
become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear …
Breaking physical and linguistic borders: Multilingual federated prompt tuning for low-resource languages
Pretrained large language models (LLMs) have emerged as a cornerstone in modern
natural language processing, with their utility expanding to various applications and …
natural language processing, with their utility expanding to various applications and …