Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
" I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset
As language models grow in popularity, it becomes increasingly important to clearly
measure all possible markers of demographic identity in order to avoid perpetuating existing …
measure all possible markers of demographic identity in order to avoid perpetuating existing …
Designing responsible ai: Adaptations of ux practice to meet responsible ai challenges
Technology companies continue to invest in efforts to incorporate responsibility in their
Artificial Intelligence (AI) advancements, while efforts to audit and regulate AI systems …
Artificial Intelligence (AI) advancements, while efforts to audit and regulate AI systems …
QuALITY: Question answering with long input texts, yes!
To enable building and testing models on long-document comprehension, we introduce
QuALITY, a multiple-choice QA dataset with context passages in English that have an …
QuALITY, a multiple-choice QA dataset with context passages in English that have an …
WANLI: Worker and AI collaboration for natural language inference dataset creation
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …
Webqa: Multihop and multimodal qa
Y Chang, M Narang, H Suzuki… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature
of web searches, requires fundamental advances in visual representation learning …
of web searches, requires fundamental advances in visual representation learning …
Don't blame the annotator: Bias already starts in the annotation instructions
In recent years, progress in NLU has been driven by benchmarks. These benchmarks are
typically collected by crowdsourcing, where annotators write examples based on annotation …
typically collected by crowdsourcing, where annotators write examples based on annotation …
Creak: A dataset for commonsense reasoning over entity knowledge
Most benchmark datasets targeting commonsense reasoning focus on everyday scenarios:
physical knowledge like knowing that you could fill a cup under a waterfall [Talmor et al …
physical knowledge like knowing that you could fill a cup under a waterfall [Talmor et al …
Multimodal large language models for inclusive collaboration learning tasks
A Lewis - Proceedings of the 2022 Conference of the North …, 2022 - aclanthology.org
This PhD project leverages advancements in multimodal large language models to build an
inclusive collaboration feedback loop, in order to facilitate the automated detection …
inclusive collaboration feedback loop, in order to facilitate the automated detection …
Analyzing dynamic adversarial training data in the limit
To create models that are robust across a wide range of test inputs, training datasets should
include diverse examples that span numerous phenomena. Dynamic adversarial data …
include diverse examples that span numerous phenomena. Dynamic adversarial data …
A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains
Prompting language models to provide step-by-step answers (eg," Chain-of-Thought") is the
prominent approach for complex reasoning tasks, where more accurate reasoning chains …
prominent approach for complex reasoning tasks, where more accurate reasoning chains …