The neuroconnectionist research programme
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to
model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have …
model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have …
Data and its (dis) contents: A survey of dataset development and use in machine learning research
In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …
predominant practices for dataset collection and use in the field of machine learning. We …
The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation
B Plank - arxiv preprint arxiv:2211.02570, 2022 - arxiv.org
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …
learning (ML) aim at minimizing human label variation, with the assumption to maximize …
AI and the everything in the whole wide world benchmark
There is a tendency across different subfields in AI to valorize a small collection of influential
benchmarks. These benchmarks operate as stand-ins for a range of anointed common …
benchmarks. These benchmarks operate as stand-ins for a range of anointed common …
Reduced, reused and recycled: The life of a dataset in machine learning research
Benchmark datasets play a central role in the organization of machine learning research.
They coordinate researchers around shared research problems and serve as a measure of …
They coordinate researchers around shared research problems and serve as a measure of …
Benchmarks for automated commonsense reasoning: A survey
E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …
Evaluation gaps in machine learning practice
Forming a reliable judgement of a machine learning (ML) model's appropriateness for an
application ecosystem is critical for its responsible use, and requires considering a broad …
application ecosystem is critical for its responsible use, and requires considering a broad …
Position: Key claims in llm research have a long tail of footnotes
Much of the recent discourse within the ML community has been centered around Large
Language Models (LLMs), their functionality and potential--yet not only do we not have a …
Language Models (LLMs), their functionality and potential--yet not only do we not have a …
Evaluation examples are not equally informative: How should that change NLP leaderboards?
Leaderboards are widely used in NLP and push the field forward. While leaderboards are a
straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items …
straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items …
Underspecification in scene description-to-depiction tasks
Questions regarding implicitness, ambiguity and underspecification are crucial for
understanding the task validity and ethical concerns of multimodal image+ text systems, yet …
understanding the task validity and ethical concerns of multimodal image+ text systems, yet …