Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
XLM-V: Overcoming the vocabulary bottleneck in multilingual masked language models
Large multilingual language models typically rely on a single vocabulary shared across
100+ languages. As these models have increased in parameter count and depth …
100+ languages. As these models have increased in parameter count and depth …
E-BERT: Efficient-yet-effective entity embeddings for BERT
We present a novel way of injecting factual knowledge about entities into the pretrained
BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al …
BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al …
exBERT: Extending pre-trained models with domain-specific vocabulary under constrained training resources
We introduce exBERT, a training method to extend BERT pre-trained models from a general
domain to a new pre-trained model for a specific domain with a new additive vocabulary …
domain to a new pre-trained model for a specific domain with a new additive vocabulary …
Taming pre-trained language models with n-gram representations for low-resource domain adaptation
Large pre-trained models such as BERT are known to improve different downstream NLP
tasks, even when such a model is trained on a generic domain. Moreover, recent studies …
tasks, even when such a model is trained on a generic domain. Moreover, recent studies …
FOCUS: Effective embedding initialization for monolingual specialization of multilingual models
Using model weights pretrained on a high-resource language as a warm start can reduce
the need for data and compute to obtain high-quality language models for other, especially …
the need for data and compute to obtain high-quality language models for other, especially …
Dynamic language models for continuously evolving content
The content on the web is in a constant state of flux. New entities, issues, and ideas
continuously emerge, while the semantics of the existing conversation topics gradually shift …
continuously emerge, while the semantics of the existing conversation topics gradually shift …
Inexpensive domain adaptation of pretrained language models: Case studies on biomedical NER and covid-19 QA
Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by
unsupervised pretraining on target-domain text. While successful, this approach is …
unsupervised pretraining on target-domain text. While successful, this approach is …
Swahbert: Language model of swahili
The rapid development of social networks, electronic commerce, mobile Internet, and other
technologies, has influenced the growth of Web data. Social media and Internet forums are …
technologies, has influenced the growth of Web data. Social media and Internet forums are …
OFA: A framework of initializing unseen subword embeddings for efficient large-scale multilingual continued pretraining
Instead of pretraining multilingual language models from scratch, a more efficient method is
to adapt existing pretrained language models (PLMs) to new languages via vocabulary …
to adapt existing pretrained language models (PLMs) to new languages via vocabulary …
Local structure matters most: Perturbation study in NLU
Recent research analyzing the sensitivity of natural language understanding models to word-
order perturbations has shown that neural models are surprisingly insensitive to the order of …
order perturbations has shown that neural models are surprisingly insensitive to the order of …