Advances, challenges and opportunities in creating data for trustworthy AI
As artificial intelligence (AI) transitions from research to deployment, creating the appropriate
datasets and data pipelines to develop and evaluate AI models is increasingly the biggest …
datasets and data pipelines to develop and evaluate AI models is increasingly the biggest …
A survey on data‐efficient algorithms in big data era
A Adadi - Journal of Big Data, 2021 - Springer
The leading approaches in Machine Learning are notoriously data-hungry. Unfortunately,
many application domains do not have access to big data because acquiring data involves a …
many application domains do not have access to big data because acquiring data involves a …
Data collection and quality challenges in deep learning: A data-centric ai perspective
Data-centric AI is at the center of a fundamental shift in software engineering where machine
learning becomes the new software, powered by big data and computing infrastructure …
learning becomes the new software, powered by big data and computing infrastructure …
An open source machine learning framework for efficient and transparent systematic reviews
R Van De Schoot, J De Bruin, R Schram… - Nature machine …, 2021 - nature.com
To help researchers conduct a systematic review or meta-analysis as efficiently and
transparently as possible, we designed a tool to accelerate the step of screening titles and …
transparently as possible, we designed a tool to accelerate the step of screening titles and …
No subclass left behind: Fine-grained robustness in coarse-grained classification problems
In real-world classification tasks, each class often comprises multiple finer-grained"
subclasses." As the subclass labels are frequently unavailable, models trained using only …
subclasses." As the subclass labels are frequently unavailable, models trained using only …
Automl to date and beyond: Challenges and opportunities
As big data becomes ubiquitous across domains, and more and more stakeholders aspire to
make the most of their data, demand for machine learning tools has spurred researchers to …
make the most of their data, demand for machine learning tools has spurred researchers to …
CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT
The extraction of labels from radiology text reports enables large-scale training of medical
imaging models. Existing approaches to report labeling typically rely either on sophisticated …
imaging models. Existing approaches to report labeling typically rely either on sophisticated …
Fine-tuning pre-trained language model with weak supervision: A contrastive-regularized self-training approach
Fine-tuned pre-trained language models (LMs) have achieved enormous success in many
natural language processing (NLP) tasks, but they still require excessive labeled data in the …
natural language processing (NLP) tasks, but they still require excessive labeled data in the …
Named entity recognition without labelled data: A weak supervision approach
Named Entity Recognition (NER) performance often degrades rapidly when applied to target
domains that differ from the texts observed during training. When in-domain labelled data is …
domains that differ from the texts observed during training. When in-domain labelled data is …
Designing ground truth and the social life of labels
Ground-truth labeling is an important activity in machine learning. Many studies have
examined how crowdworkers apply labels to records in machine learning datasets …
examined how crowdworkers apply labels to records in machine learning datasets …