Scientific discovery in the age of artificial intelligence

H Wang, T Fu, Y Du, W Gao, K Huang, Z Liu… - Nature, 2023 - nature.com
Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment
and accelerate research, hel** scientists to generate hypotheses, design experiments …

Advances, challenges and opportunities in creating data for trustworthy AI

W Liang, GA Tadesse, D Ho, L Fei-Fei… - Nature Machine …, 2022 - nature.com
As artificial intelligence (AI) transitions from research to deployment, creating the appropriate
datasets and data pipelines to develop and evaluate AI models is increasingly the biggest …

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

Ask me anything: A simple strategy for prompting language models

S Arora, A Narayan, MF Chen, L Orr, N Guha… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a
natural language prompt that demonstrates how to perform the task and no additional …

Pervasive label errors in test sets destabilize machine learning benchmarks

CG Northcutt, A Athalye, J Mueller - arxiv preprint arxiv:2103.14749, 2021 - arxiv.org
We identify label errors in the test sets of 10 of the most commonly-used computer vision,
natural language, and audio datasets, and subsequently study the potential for these label …

Data-centric artificial intelligence: A survey

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - ACM Computing …, 2025 - dl.acm.org
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

Confident learning: Estimating uncertainty in dataset labels

C Northcutt, L Jiang, I Chuang - Journal of Artificial Intelligence Research, 2021 - jair.org
Learning exists in the context of data, yet notions of confidence typically focus on model
predictions, not label quality. Confident learning (CL) is an alternative approach which …

Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis

D Karimi, H Dou, SK Warfield, A Gholipour - Medical image analysis, 2020 - Elsevier
Supervised training of deep learning models requires large labeled datasets. There is a
growing interest in obtaining such datasets for medical image analysis applications …

[HTML][HTML] A state-of-the-art survey on deep learning theory and architectures

MZ Alom, TM Taha, C Yakopcic, S Westberg, P Sidike… - electronics, 2019 - mdpi.com
In recent years, deep learning has garnered tremendous success in a variety of application
domains. This new field of machine learning has been growing rapidly and has been …

A survey on data collection for machine learning: a big data-ai integration perspective

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …