- Academic Search

T Liao, R Taori, ID Raji, L Schmidt - Thirty-fifth Conference on …, 2021 - openreview.net

Many subfields of machine learning share a common stumbling block: evaluation. Advances
in machine learning often evaporate under closer scrutiny or turn out to be less widely …

Uložit Citovat Počet citací tohoto článku: 130 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained transformers: an empirical comparison

S Casola, I Lauriola, A Lavelli - Machine Learning with Applications, 2022 - Elsevier

Pre-trained transformers have rapidly become very popular in the Natural Language
Processing (NLP) community, surpassing the previous state of the art in a wide variety of …

Uložit Citovat Počet citací tohoto článku: 59 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Xstest: A test suite for identifying exaggerated safety behaviours in large language models

P Röttger, HR Kirk, B Vidgen, G Attanasio… - arxiv preprint arxiv …, 2023 - arxiv.org

Without proper safeguards, large language models will readily follow malicious instructions
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …

Uložit Citovat Počet citací tohoto článku: 143 Související články Všechny verze (počet: 6) Zobrazit jako HTML

An introduction to deep learning in natural language processing: Models, techniques, and tools

I Lauriola, A Lavelli, F Aiolli - Neurocomputing, 2022 - Elsevier

Abstract Natural Language Processing (NLP) is a branch of artificial intelligence that
involves the design and implementation of systems and algorithms able to interact through …

Uložit Citovat Počet citací tohoto článku: 583 Související články Všechny verze (počet: 6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arxiv preprint arxiv …, 2021 - arxiv.org

We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Uložit Citovat Počet citací tohoto článku: 429 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Underspecification presents challenges for credibility in modern machine learning

A D'Amour, K Heller, D Moldovan, B Adlam… - Journal of Machine …, 2022 - jmlr.org

Machine learning (ML) systems often exhibit unexpectedly poor behavior when they are
deployed in real-world domains. We identify underspecification in ML pipelines as a key …

Uložit Citovat Počet citací tohoto článku: 828 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Wilds: A benchmark of in-the-wild distribution shifts

PW Koh, S Sagawa, H Marklund… - International …, 2021 - proceedings.mlr.press

Distribution shifts—where the training distribution differs from the test distribution—can
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …

Uložit Citovat Počet citací tohoto článku: 1573 Související články Všechny verze (počet: 13) Zobrazit jako HTML

Tandem mass spectrum prediction for small molecules using graph transformers

A Young, H Röst, B Wang - Nature Machine Intelligence, 2024 - nature.com

Tandem mass spectra capture fragmentation patterns that provide key structural information
about molecules. Although mass spectrometry is applied in many areas, the vast majority of …

Uložit Citovat Počet citací tohoto článku: 20 Související články

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

HateCheck: Functional tests for hate speech detection models

P Röttger, B Vidgen, D Nguyen, Z Waseem… - arxiv preprint arxiv …, 2020 - arxiv.org

Detecting online hate is a difficult task that even state-of-the-art models struggle with.
Typically, hate speech detection models are evaluated by measuring their performance on …

Uložit Citovat Počet citací tohoto článku: 272 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards debiasing NLU models from unknown biases

PA Utama, NS Moosavi, I Gurevych - arxiv preprint arxiv:2009.12303, 2020 - arxiv.org

NLU models often exploit biases to achieve high dataset-specific performance without
properly learning the intended task. Recently proposed debiasing methods are shown to be …

Uložit Citovat Počet citací tohoto článku: 156 Související články Všechny verze (počet: 4) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

The curse of performance instability in analysis datasets: Consequences, source, and suggestions

Are we learning yet? a meta review of evaluation failures across machine learning

[HTML][HTML] Pre-trained transformers: an empirical comparison

Xstest: A test suite for identifying exaggerated safety behaviours in large language models

An introduction to deep learning in natural language processing: Models, techniques, and tools

Dynabench: Rethinking benchmarking in NLP

Underspecification presents challenges for credibility in modern machine learning

Wilds: A benchmark of in-the-wild distribution shifts

Tandem mass spectrum prediction for small molecules using graph transformers

HateCheck: Functional tests for hate speech detection models

Towards debiasing NLU models from unknown biases