Data-centric ai: Perspectives and challenges
The role of data in building AI systems has recently been significantly magnified by the
emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model …
emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model …
[HTML][HTML] Self-training: A survey
Self-training methods have gained significant attention in recent years due to their
effectiveness in leveraging small labeled datasets and large unlabeled observations for …
effectiveness in leveraging small labeled datasets and large unlabeled observations for …
A survey on programmatic weak supervision
Labeling training data has become one of the major roadblocks to using machine learning.
Among various weak supervision paradigms, programmatic weak supervision (PWS) has …
Among various weak supervision paradigms, programmatic weak supervision (PWS) has …
WRENCH: A comprehensive benchmark for weak supervision
Recent Weak Supervision (WS) approaches have had widespread success in easing the
bottleneck of labeling training data for machine learning by synthesizing labels from multiple …
bottleneck of labeling training data for machine learning by synthesizing labels from multiple …
Theoretical analysis of weak-to-strong generalization
Strong student models can learn from weaker teachers: when trained on the predictions of a
weaker model, a strong pretrained student can learn to correct the weak model's errors and …
weaker model, a strong pretrained student can learn to correct the weak model's errors and …
Meta self-training for few-shot neural sequence labeling
Neural sequence labeling is widely adopted for many Natural Language Processing (NLP)
tasks, such as Named Entity Recognition (NER) and slot tagging for dialog systems and …
tasks, such as Named Entity Recognition (NER) and slot tagging for dialog systems and …
Language models in the loop: Incorporating prompting into weak supervision
We propose a new strategy for applying large pre-trained language models to novel tasks
when labeled training data is limited. Rather than apply the model in a typical zero-shot or …
when labeled training data is limited. Rather than apply the model in a typical zero-shot or …
Characterizing the impacts of semi-supervised learning for weak supervision
Labeling training data is a critical and expensive step in producing high accuracy ML
models, whether training from scratch or fine-tuning. To make labeling more efficient, two …
models, whether training from scratch or fine-tuning. To make labeling more efficient, two …
[HTML][HTML] MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning
Among the biggest challenges we face in utilizing neural networks trained on waveform (ie,
seismic, electromagnetic, or ultrasound) data is its application to real data. The requirement …
seismic, electromagnetic, or ultrasound) data is its application to real data. The requirement …
Training subset selection for weak supervision
Existing weak supervision approaches use all the data covered by weak signals to train a
classifier. We show both theoretically and empirically that this is not always optimal …
classifier. We show both theoretically and empirically that this is not always optimal …