Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Large language models for software engineering: A systematic literature review
Large Language Models (LLMs) have significantly impacted numerous domains, including
Software Engineering (SE). Many recent publications have explored LLMs applied to …
Software Engineering (SE). Many recent publications have explored LLMs applied to …
A survey on data selection for language models
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …
ever-growing text datasets for unsupervised pre-training. However, naively training a model …
Doremi: Optimizing data mixtures speeds up language model pretraining
The mixture proportions of pretraining data domains (eg, Wikipedia, books, web text) greatly
affect language model (LM) performance. In this paper, we propose Domain Reweighting …
affect language model (LM) performance. In this paper, we propose Domain Reweighting …
Data selection for language models via importance resampling
Selecting a suitable pretraining dataset is crucial for both general-domain (eg, GPT-3) and
domain-specific (eg, Codex) language models (LMs). We formalize this problem as selecting …
domain-specific (eg, Codex) language models (LMs). We formalize this problem as selecting …
On the dangers of stochastic parrots: Can language models be too big?🦜
The past 3 years of work in NLP have been characterized by the development and
deployment of ever larger language models, especially for English. BERT, its variants, GPT …
deployment of ever larger language models, especially for English. BERT, its variants, GPT …
A survey on curriculum learning
Curriculum learning (CL) is a training strategy that trains a machine learning model from
easier data to harder data, which imitates the meaningful learning order in human curricula …
easier data to harder data, which imitates the meaningful learning order in human curricula …
Don't stop pretraining: Adapt language models to domains and tasks
Language models pretrained on text from a wide variety of sources form the foundation of
today's NLP. In light of the success of these broad-coverage models, we investigate whether …
today's NLP. In light of the success of these broad-coverage models, we investigate whether …
Skill-it! a data-driven skills framework for understanding and training language models
The quality of training data impacts the performance of pre-trained large language models
(LMs). Given a fixed budget of tokens, we study how to best select data that leads to good …
(LMs). Given a fixed budget of tokens, we study how to best select data that leads to good …
Survey of low-resource machine translation
We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …
research. There are currently around 7,000 languages spoken in the world and almost all …
Neural unsupervised domain adaptation in NLP---a survey
Deep neural networks excel at learning from labeled data and achieve state-of-the-art
resultson a wide array of Natural Language Processing tasks. In contrast, learning from …
resultson a wide array of Natural Language Processing tasks. In contrast, learning from …