Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
[HTML][HTML] Leakage and the reproducibility crisis in machine-learning-based science
Machine-learning (ML) methods have gained prominence in the quantitative sciences.
However, there are many known methodological pitfalls, including data leakage, in ML …
However, there are many known methodological pitfalls, including data leakage, in ML …
A taxonomy and review of generalization research in NLP
The ability to generalize well is one of the primary desiderata for models of natural language
processing (NLP), but what 'good generalization'entails and how it should be evaluated is …
processing (NLP), but what 'good generalization'entails and how it should be evaluated is …
Impact of pretraining term frequencies on few-shot reasoning
Pretrained Language Models (LMs) have demonstrated ability to perform numerical
reasoning by extrapolating from a few examples in few-shot settings. However, the extent to …
reasoning by extrapolating from a few examples in few-shot settings. However, the extent to …
SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020)
We present the results and main findings of SemEval-2020 Task 12 on Multilingual
Offensive Language Identification in Social Media (OffensEval 2020). The task involves …
Offensive Language Identification in Social Media (OffensEval 2020). The task involves …
Show your work: Improved reporting of experimental results
Research in natural language processing proceeds, in part, by demonstrating that new
models achieve superior performance (eg, accuracy) on held-out test data, compared to …
models achieve superior performance (eg, accuracy) on held-out test data, compared to …
Probing toxic content in large pre-trained language models
Large pre-trained language models (PTLMs) have been shown to carry biases towards
different social groups which leads to the reproduction of stereotypical and toxic content by …
different social groups which leads to the reproduction of stereotypical and toxic content by …
Toxicity detection: Does context really matter?
Moderation is crucial to promoting healthy on-line discussions. Although severaltoxicity'
detection datasets and models have been published, most of them ignore the context of the …
detection datasets and models have been published, most of them ignore the context of the …
MultiEURLEX--A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer
We introduce MULTI-EURLEX, a new multilingual dataset for topic classification of legal
documents. The dataset comprises 65k European Union (EU) laws, officially translated in 23 …
documents. The dataset comprises 65k European Union (EU) laws, officially translated in 23 …
On the value of out-of-distribution testing: An example of goodhart's law
Abstract Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine
learning system's ability to generalize beyond the biases of a training set. OOD benchmarks …
learning system's ability to generalize beyond the biases of a training set. OOD benchmarks …