Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …
learning (ML) aim at minimizing human label variation, with the assumption to maximize …
Learning from disagreement: A survey
Abstract Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer
evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …
evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …
Computational models of anaphora
Interpreting anaphoric references is a fundamental aspect of our language competence that
has long attracted the attention of computational linguists. The appearance of ever-larger …
has long attracted the attention of computational linguists. The appearance of ever-larger …
Inherent disagreements in human textual inferences
We analyze human's disagreements about the validity of natural language inferences. We
show that, very often, disagreements are not dismissible as annotation “noise”, but rather …
show that, very often, disagreements are not dismissible as annotation “noise”, but rather …
What will it take to fix benchmarking in natural language understanding?
Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and
biased systems score so highly on standard benchmarks that there is little room for …
biased systems score so highly on standard benchmarks that there is little room for …
Why don't you do it right? analysing annotators' disagreement in subjective tasks
Annotators' disagreement in linguistic data has been recently the focus of multiple initiatives
aimed at raising awareness on issues related to 'majority voting'when aggregating diverging …
aimed at raising awareness on issues related to 'majority voting'when aggregating diverging …
SemEval-2023 task 11: Learning with disagreements (LeWiDi)
NLP datasets annotated with human judgments are rife with disagreements between the
judges. This is especially true for tasks depending on subjective judgments such as …
judges. This is especially true for tasks depending on subjective judgments such as …
Investigating reasons for disagreement in natural language inference
We investigate how disagreement in natural language inference (NLI) annotation arises. We
developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level …
developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level …
Quoref: A reading comprehension dataset with questions requiring coreferential reasoning
Machine comprehension of texts longer than a single sentence often requires coreference
resolution. However, most current reading comprehension benchmarks do not contain …
resolution. However, most current reading comprehension benchmarks do not contain …
What can we learn from collective human opinions on natural language inference data?
Despite the subjective nature of many NLP tasks, most NLU evaluations have focused on
using the majority label with presumably high agreement as the ground truth. Less attention …
using the majority label with presumably high agreement as the ground truth. Less attention …