Advances, challenges and opportunities in creating data for trustworthy AI

W Liang, GA Tadesse, D Ho, L Fei-Fei… - Nature Machine …, 2022 - nature.com
As artificial intelligence (AI) transitions from research to deployment, creating the appropriate
datasets and data pipelines to develop and evaluate AI models is increasingly the biggest …

Machine learning for high-entropy alloys: Progress, challenges and opportunities

X Liu, J Zhang, Z Pei - Progress in Materials Science, 2023 - Elsevier
High-entropy alloys (HEAs) have attracted extensive interest due to their exceptional
mechanical properties and the vast compositional space for new HEAs. However …

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

Beyond neural scaling laws: beating power law scaling via data pruning

B Sorscher, R Geirhos, S Shekhar… - Advances in …, 2022 - proceedings.neurips.cc
Widely observed neural scaling laws, in which error falls off as a power of the training set
size, model size, or both, have driven substantial performance improvements in deep …

Intriguing properties of vision transformers

MM Naseer, K Ranasinghe, SH Khan… - Advances in …, 2021 - proceedings.neurips.cc
Vision transformers (ViT) have demonstrated impressive performance across numerous
machine vision tasks. These models are based on multi-head self-attention mechanisms that …

[PDF][PDF] The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence

K Crawford - 2021 - static10.labirint.ru
The hidden costs of artificial intelligence, from natural resources and labor to privacy and
freedom What happens when artificial intelligence saturates political life and depletes the …

Multimodal datasets: misogyny, pornography, and malignant stereotypes

A Birhane, VU Prabhu, E Kahembwe - arxiv preprint arxiv:2110.01963, 2021 - arxiv.org
We have now entered the era of trillion parameter machine learning models trained on
billion-sized datasets scraped from the internet. The rise of these gargantuan datasets has …

Kubric: A scalable dataset generator

K Greff, F Belletti, L Beyer, C Doersch… - Proceedings of the …, 2022 - openaccess.thecvf.com
Data is the driving force of machine learning, with the amount and quality of training data
often being more important for the performance of a system than architecture and training …

Swad: Domain generalization by seeking flat minima

J Cha, S Chun, K Lee, HC Cho… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Domain generalization (DG) methods aim to achieve generalizability to an unseen
target domain by using only training data from the source domains. Although a variety of DG …