Advances, challenges and opportunities in creating data for trustworthy AI

W Liang, GA Tadesse, D Ho, L Fei-Fei… - Nature Machine …, 2022 - nature.com
As artificial intelligence (AI) transitions from research to deployment, creating the appropriate
datasets and data pipelines to develop and evaluate AI models is increasingly the biggest …

A survey on data‐efficient algorithms in big data era

A Adadi - Journal of Big Data, 2021 - Springer
The leading approaches in Machine Learning are notoriously data-hungry. Unfortunately,
many application domains do not have access to big data because acquiring data involves a …

Data collection and quality challenges in deep learning: A data-centric ai perspective

SE Whang, Y Roh, H Song, JG Lee - The VLDB Journal, 2023 - Springer
Data-centric AI is at the center of a fundamental shift in software engineering where machine
learning becomes the new software, powered by big data and computing infrastructure …

An open source machine learning framework for efficient and transparent systematic reviews

R Van De Schoot, J De Bruin, R Schram… - Nature machine …, 2021 - nature.com
To help researchers conduct a systematic review or meta-analysis as efficiently and
transparently as possible, we designed a tool to accelerate the step of screening titles and …

No subclass left behind: Fine-grained robustness in coarse-grained classification problems

N Sohoni, J Dunnmon, G Angus… - Advances in Neural …, 2020 - proceedings.neurips.cc
In real-world classification tasks, each class often comprises multiple finer-grained"
subclasses." As the subclass labels are frequently unavailable, models trained using only …

Automl to date and beyond: Challenges and opportunities

SK Karmaker, MM Hassan, MJ Smith, L Xu… - ACM Computing …, 2021 - dl.acm.org
As big data becomes ubiquitous across domains, and more and more stakeholders aspire to
make the most of their data, demand for machine learning tools has spurred researchers to …

CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT

A Smit, S Jain, P Rajpurkar, A Pareek, AY Ng… - arxiv preprint arxiv …, 2020 - arxiv.org
The extraction of labels from radiology text reports enables large-scale training of medical
imaging models. Existing approaches to report labeling typically rely either on sophisticated …

Fine-tuning pre-trained language model with weak supervision: A contrastive-regularized self-training approach

Y Yu, S Zuo, H Jiang, W Ren, T Zhao… - arxiv preprint arxiv …, 2020 - arxiv.org
Fine-tuned pre-trained language models (LMs) have achieved enormous success in many
natural language processing (NLP) tasks, but they still require excessive labeled data in the …

Named entity recognition without labelled data: A weak supervision approach

P Lison, A Hubin, J Barnes, S Touileb - arxiv preprint arxiv:2004.14723, 2020 - arxiv.org
Named Entity Recognition (NER) performance often degrades rapidly when applied to target
domains that differ from the texts observed during training. When in-domain labelled data is …

Designing ground truth and the social life of labels

M Muller, CT Wolf, J Andres, M Desmond… - Proceedings of the …, 2021 - dl.acm.org
Ground-truth labeling is an important activity in machine learning. Many studies have
examined how crowdworkers apply labels to records in machine learning datasets …