Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey on data selection for language models
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …
ever-growing text datasets for unsupervised pre-training. However, naively training a model …
A survey of text classification with transformers: How wide? how large? how long? how accurate? how expensive? how safe?
Text classification in natural language processing (NLP) is evolving rapidly, particularly with
the surge in transformer-based models, including large language models (LLM). This paper …
the surge in transformer-based models, including large language models (LLM). This paper …
[PDF][PDF] A survey of large language models
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …
of language intelligence by machine. Language is essentially a complex, intricate system of …
C-pack: Packed resources for general chinese embeddings
We introduce C-Pack, a package of resources that significantly advances the field of general
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …
Phi-3 technical report: A highly capable language model locally on your phone
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion
tokens, whose overall performance, as measured by both academic benchmarks and …
tokens, whose overall performance, as measured by both academic benchmarks and …
Textbooks are all you need
We introduce phi-1, a new large language model for code, with significantly smaller size
than competing models: phi-1 is a Transformer-based model with 1.3 B parameters, trained …
than competing models: phi-1 is a Transformer-based model with 1.3 B parameters, trained …
Rwkv: Reinventing rnns for the transformer era
Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …
suffer from memory and computational complexity that scales quadratically with sequence …
[PDF][PDF] Scalable extraction of training data from (production) language models
This paper studies extractable memorization: training data that an adversary can efficiently
extract by querying a machine learning model without prior knowledge of the training …
extract by querying a machine learning model without prior knowledge of the training …
Starcoder 2 and the stack v2: The next generation
The BigCode project, an open-scientific collaboration focused on the responsible
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …
Crosslingual generalization through multitask finetuning
Multitask prompted finetuning (MTF) has been shown to help large language models
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …