Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Are we learning yet? a meta review of evaluation failures across machine learning
Many subfields of machine learning share a common stumbling block: evaluation. Advances
in machine learning often evaporate under closer scrutiny or turn out to be less widely …
in machine learning often evaporate under closer scrutiny or turn out to be less widely …
[HTML][HTML] Pre-trained transformers: an empirical comparison
Pre-trained transformers have rapidly become very popular in the Natural Language
Processing (NLP) community, surpassing the previous state of the art in a wide variety of …
Processing (NLP) community, surpassing the previous state of the art in a wide variety of …
Xstest: A test suite for identifying exaggerated safety behaviours in large language models
Without proper safeguards, large language models will readily follow malicious instructions
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …
An introduction to deep learning in natural language processing: Models, techniques, and tools
Abstract Natural Language Processing (NLP) is a branch of artificial intelligence that
involves the design and implementation of systems and algorithms able to interact through …
involves the design and implementation of systems and algorithms able to interact through …
Dynabench: Rethinking benchmarking in NLP
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
Underspecification presents challenges for credibility in modern machine learning
Machine learning (ML) systems often exhibit unexpectedly poor behavior when they are
deployed in real-world domains. We identify underspecification in ML pipelines as a key …
deployed in real-world domains. We identify underspecification in ML pipelines as a key …
Wilds: A benchmark of in-the-wild distribution shifts
Distribution shifts—where the training distribution differs from the test distribution—can
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …
Tandem mass spectrum prediction for small molecules using graph transformers
Tandem mass spectra capture fragmentation patterns that provide key structural information
about molecules. Although mass spectrometry is applied in many areas, the vast majority of …
about molecules. Although mass spectrometry is applied in many areas, the vast majority of …
HateCheck: Functional tests for hate speech detection models
Detecting online hate is a difficult task that even state-of-the-art models struggle with.
Typically, hate speech detection models are evaluated by measuring their performance on …
Typically, hate speech detection models are evaluated by measuring their performance on …
Towards debiasing NLU models from unknown biases
NLU models often exploit biases to achieve high dataset-specific performance without
properly learning the intended task. Recently proposed debiasing methods are shown to be …
properly learning the intended task. Recently proposed debiasing methods are shown to be …