Google znalac

EM Smith, M Hall, M Kambadur, E Presani… - arxiv preprint arxiv …, 2022 - arxiv.org

As language models grow in popularity, it becomes increasingly important to clearly
measure all possible markers of demographic identity in order to avoid perpetuating existing …

Spremi Citiraj Spominje se 149 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Designing responsible ai: Adaptations of ux practice to meet responsible ai challenges

Q Wang, M Madaio, S Kane, S Kapania… - Proceedings of the …, 2023 - dl.acm.org

Technology companies continue to invest in efforts to incorporate responsibility in their
Artificial Intelligence (AI) advancements, while efforts to audit and regulate AI systems …

Spremi Citiraj Spominje se 64 puta Srodni članci Svih 4 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

QuALITY: Question answering with long input texts, yes!

RY Pang, A Parrish, N Joshi, N Nangia, J Phang… - arxiv preprint arxiv …, 2021 - arxiv.org

To enable building and testing models on long-document comprehension, we introduce
QuALITY, a multiple-choice QA dataset with context passages in English that have an …

Spremi Citiraj Spominje se 126 puta Srodni članci Svih 8 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

WANLI: Worker and AI collaboration for natural language inference dataset creation

A Liu, S Swayamdipta, NA Smith, Y Choi - arxiv preprint arxiv:2201.05955, 2022 - arxiv.org

A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …

Spremi Citiraj Spominje se 219 puta Srodni članci Svih 5 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Webqa: Multihop and multimodal qa

Y Chang, M Narang, H Suzuki… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature
of web searches, requires fundamental advances in visual representation learning …

Spremi Citiraj Spominje se 90 puta Srodni članci Svih 7 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Don't blame the annotator: Bias already starts in the annotation instructions

M Parmar, S Mishra, M Geva, C Baral - arxiv preprint arxiv:2205.00415, 2022 - arxiv.org

In recent years, progress in NLU has been driven by benchmarks. These benchmarks are
typically collected by crowdsourcing, where annotators write examples based on annotation …

Spremi Citiraj Spominje se 64 puta Srodni članci Svih 7 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Creak: A dataset for commonsense reasoning over entity knowledge

Y Onoe, MJQ Zhang, E Choi, G Durrett - arxiv preprint arxiv:2109.01653, 2021 - arxiv.org

Most benchmark datasets targeting commonsense reasoning focus on everyday scenarios:
physical knowledge like knowing that you could fill a cup under a waterfall [Talmor et al …

Spremi Citiraj Spominje se 68 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Multimodal large language models for inclusive collaboration learning tasks

A Lewis - Proceedings of the 2022 Conference of the North …, 2022 - aclanthology.org

This PhD project leverages advancements in multimodal large language models to build an
inclusive collaboration feedback loop, in order to facilitate the automated detection …

Spremi Citiraj Spominje se 29 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Analyzing dynamic adversarial training data in the limit

E Wallace, A Williams, R Jia, D Kiela - arxiv preprint arxiv:2110.08514, 2021 - arxiv.org

To create models that are robust across a wide range of test inputs, training datasets should
include diverse examples that span numerous phenomena. Dynamic adversarial data …

Spremi Citiraj Spominje se 40 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains

A Jacovi, Y Bitton, B Bohnet, J Herzig… - arxiv preprint arxiv …, 2024 - arxiv.org

Prompting language models to provide step-by-step answers (eg," Chain-of-Thought") is the
prominent approach for complex reasoning tasks, where more accurate reasoning chains …

Spremi Citiraj Spominje se 11 puta Srodni članci Svih 8 inačica Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

What ingredients make for an effective crowdsourcing protocol for difficult NLU data collection...

" I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset

Designing responsible ai: Adaptations of ux practice to meet responsible ai challenges

QuALITY: Question answering with long input texts, yes!

WANLI: Worker and AI collaboration for natural language inference dataset creation

Webqa: Multihop and multimodal qa

Don't blame the annotator: Bias already starts in the annotation instructions

Creak: A dataset for commonsense reasoning over entity knowledge

Multimodal large language models for inclusive collaboration learning tasks

Analyzing dynamic adversarial training data in the limit

A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains