XLM-V: Overcoming the vocabulary bottleneck in multilingual masked language models

D Liang, H Gonen, Y Mao, R Hou, N Goyal… - arxiv preprint arxiv …, 2023 - arxiv.org
Large multilingual language models typically rely on a single vocabulary shared across
100+ languages. As these models have increased in parameter count and depth …

E-BERT: Efficient-yet-effective entity embeddings for BERT

N Poerner, U Waltinger, H Schütze - arxiv preprint arxiv:1911.03681, 2019 - arxiv.org
We present a novel way of injecting factual knowledge about entities into the pretrained
BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al …

exBERT: Extending pre-trained models with domain-specific vocabulary under constrained training resources

W Tai, HT Kung, XL Dong, M Comiter… - Findings of the …, 2020 - aclanthology.org
We introduce exBERT, a training method to extend BERT pre-trained models from a general
domain to a new pre-trained model for a specific domain with a new additive vocabulary …

Taming pre-trained language models with n-gram representations for low-resource domain adaptation

S Diao, R Xu, H Su, Y Jiang, Y Song… - Proceedings of the 59th …, 2021 - aclanthology.org
Large pre-trained models such as BERT are known to improve different downstream NLP
tasks, even when such a model is trained on a generic domain. Moreover, recent studies …

FOCUS: Effective embedding initialization for monolingual specialization of multilingual models

K Dobler, G De Melo - arxiv preprint arxiv:2305.14481, 2023 - arxiv.org
Using model weights pretrained on a high-resource language as a warm start can reduce
the need for data and compute to obtain high-quality language models for other, especially …

Dynamic language models for continuously evolving content

S Amba Hombaiah, T Chen, M Zhang… - Proceedings of the 27th …, 2021 - dl.acm.org
The content on the web is in a constant state of flux. New entities, issues, and ideas
continuously emerge, while the semantics of the existing conversation topics gradually shift …

Inexpensive domain adaptation of pretrained language models: Case studies on biomedical NER and covid-19 QA

N Poerner, U Waltinger, H Schütze - arxiv preprint arxiv:2004.03354, 2020 - arxiv.org
Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by
unsupervised pretraining on target-domain text. While successful, this approach is …

Swahbert: Language model of swahili

G Martin, ME Mswahili, YS Jeong… - Proceedings of the 2022 …, 2022 - aclanthology.org
The rapid development of social networks, electronic commerce, mobile Internet, and other
technologies, has influenced the growth of Web data. Social media and Internet forums are …

OFA: A framework of initializing unseen subword embeddings for efficient large-scale multilingual continued pretraining

Y Liu, P Lin, M Wang, H Schütze - arxiv preprint arxiv:2311.08849, 2023 - arxiv.org
Instead of pretraining multilingual language models from scratch, a more efficient method is
to adapt existing pretrained language models (PLMs) to new languages via vocabulary …

Local structure matters most: Perturbation study in NLU

L Clouatre, P Parthasarathi, A Zouaq… - Findings of the …, 2022 - aclanthology.org
Recent research analyzing the sensitivity of natural language understanding models to word-
order perturbations has shown that neural models are surprisingly insensitive to the order of …