- Academic Search

A Doerig, RP Sommers, K Seeliger… - Nature Reviews …, 2023 - nature.com

Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to
model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have …

Zapisz Cytuj Cytowane przez 152 Powiązane artykuły Wszystkie wersje 15

[Free GPT-4]

[PDF] cell.com Full View

Data and its (dis) contents: A survey of dataset development and use in machine learning research

A Paullada, ID Raji, EM Bender, E Denton, A Hanna - Patterns, 2021 - cell.com

In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …

Zapisz Cytuj Cytowane przez 665 Powiązane artykuły Wszystkie wersje 12

[Free GPT-4]

[PDF] arxiv.org

The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

B Plank - arxiv preprint arxiv:2211.02570, 2022 - arxiv.org

Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …

Zapisz Cytuj Cytowane przez 194 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

AI and the everything in the whole wide world benchmark

ID Raji, EM Bender, A Paullada, E Denton… - arxiv preprint arxiv …, 2021 - arxiv.org

There is a tendency across different subfields in AI to valorize a small collection of influential
benchmarks. These benchmarks operate as stand-ins for a range of anointed common …

Zapisz Cytuj Cytowane przez 318 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Reduced, reused and recycled: The life of a dataset in machine learning research

B Koch, E Denton, A Hanna, JG Foster - arxiv preprint arxiv:2112.01716, 2021 - arxiv.org

Benchmark datasets play a central role in the organization of machine learning research.
They coordinate researchers around shared research problems and serve as a measure of …

Zapisz Cytuj Cytowane przez 160 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org

More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

Zapisz Cytuj Cytowane przez 58 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]

[PDF] acm.org

Evaluation gaps in machine learning practice

B Hutchinson, N Rostamzadeh, C Greer… - Proceedings of the …, 2022 - dl.acm.org

Forming a reliable judgement of a machine learning (ML) model's appropriateness for an
application ecosystem is critical for its responsible use, and requires considering a broad …

Zapisz Cytuj Cytowane przez 63 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]

[PDF] openreview.net

Position: Key claims in llm research have a long tail of footnotes

A Rogers, S Luccioni - Forty-first International Conference on …, 2024 - openreview.net

Much of the recent discourse within the ML community has been centered around Large
Language Models (LLMs), their functionality and potential--yet not only do we not have a …

Zapisz Cytuj Cytowane przez 9 Powiązane artykuły Wersja HTML

[Free GPT-4]

[PDF] aclanthology.org

Evaluation examples are not equally informative: How should that change NLP leaderboards?

P Rodriguez, J Barrow, AM Hoyle… - Proceedings of the …, 2021 - aclanthology.org

Leaderboards are widely used in NLP and push the field forward. While leaderboards are a
straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items …

Zapisz Cytuj Cytowane przez 83 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Underspecification in scene description-to-depiction tasks

B Hutchinson, J Baldridge, V Prabhakaran - arxiv preprint arxiv …, 2022 - arxiv.org

Questions regarding implicitness, ambiguity and underspecification are crucial for
understanding the task validity and ethical concerns of multimodal image+ text systems, yet …

Zapisz Cytuj Cytowane przez 35 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Targeting the benchmark: On methodology in current natural language processing research

The neuroconnectionist research programme

Data and its (dis) contents: A survey of dataset development and use in machine learning research

The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

AI and the everything in the whole wide world benchmark

Reduced, reused and recycled: The life of a dataset in machine learning research

Benchmarks for automated commonsense reasoning: A survey

Evaluation gaps in machine learning practice

Position: Key claims in llm research have a long tail of footnotes

Evaluation examples are not equally informative: How should that change NLP leaderboards?

Underspecification in scene description-to-depiction tasks