Google znalac

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu

The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

Spremi Citiraj Spominje se 633 puta Srodni članci Svih 13 inačica

Improving the reliability of deep neural networks in NLP: A review

B Alshemali, J Kalita - Knowledge-Based Systems, 2020 - Elsevier

Deep learning models have achieved great success in solving a variety of natural language
processing (NLP) problems. An ever-growing body of research, however, illustrates the …

Spremi Citiraj Spominje se 235 puta Srodni članci Svih 2 inačica

[Free GPT-4]
[DeepSeek]

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Spremi Citiraj Spominje se 413 puta Srodni članci Svih 9 inačica Prikaži kao HTML

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

K Zhu, J Wang, J Zhou, Z Wang, H Chen… - arxiv e …, 2023 - ui.adsabs.harvard.edu

The increasing reliance on Large Language Models (LLMs) across academia and industry
necessitates a comprehensive understanding of their robustness to prompts. In response to …

Spremi Citiraj Spominje se 253 puta Srodni članci

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arxiv preprint arxiv …, 2021 - arxiv.org

We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Spremi Citiraj Spominje se 429 puta Srodni članci Svih 8 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Underspecification presents challenges for credibility in modern machine learning

A D'Amour, K Heller, D Moldovan, B Adlam… - Journal of Machine …, 2022 - jmlr.org

Machine learning (ML) systems often exhibit unexpectedly poor behavior when they are
deployed in real-world domains. We identify underspecification in ML pipelines as a key …

Spremi Citiraj Spominje se 828 puta Srodni članci Svih 8 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Causal inference in natural language processing: Estimation, prediction, interpretation and beyond

A Feder, KA Keith, E Manzoor, R Pryzant… - Transactions of the …, 2022 - direct.mit.edu

A fundamental goal of scientific research is to learn about causal relationships. However,
despite its critical role in the life and social sciences, causality has not had the same …

Spremi Citiraj Spominje se 272 puta Srodni članci Svih 11 inačica

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

mgpt: Few-shot learners go multilingual

O Shliazhko, A Fenogenova, M Tikhonova… - Transactions of the …, 2024 - direct.mit.edu

This paper introduces mGPT, a multilingual variant of GPT-3, pretrained on 61 languages
from 25 linguistically diverse language families using Wikipedia and the C4 Corpus. We …

Spremi Citiraj Spominje se 150 puta Srodni članci Svih 9 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adversarial NLI: A new benchmark for natural language understanding

Y Nie, A Williams, E Dinan, M Bansal, J Weston… - arxiv preprint arxiv …, 2019 - arxiv.org

We introduce a new large-scale NLI benchmark dataset, collected via an iterative,
adversarial human-and-model-in-the-loop procedure. We show that training models on this …

Spremi Citiraj Spominje se 1025 puta Srodni članci Svih 9 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

JH Clark, E Choi, M Collins, D Garrette… - Transactions of the …, 2020 - direct.mit.edu

Confidently making progress on multilingual modeling requires challenging, trustworthy
evaluations. We present TyDi QA—a question answering dataset covering 11 typologically …

Spremi Citiraj Spominje se 583 puta Srodni članci Svih 13 inačica

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Stress test evaluation for natural language inference

Analysis methods in neural language processing: A survey

Improving the reliability of deep neural networks in NLP: A review

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

Dynabench: Rethinking benchmarking in NLP

Underspecification presents challenges for credibility in modern machine learning

Causal inference in natural language processing: Estimation, prediction, interpretation and beyond

mgpt: Few-shot learners go multilingual

Adversarial NLI: A new benchmark for natural language understanding

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages