Active learning literature survey

B Settles - 2009 - minds.wisconsin.edu
The key idea behind active learning is that a machine learning algorithm can achieve
greater accuracy with fewer labeled training instances if it is allowed to choose the training …

Get another label? improving data quality and data mining using multiple, noisy labelers

VS Sheng, F Provost, PG Ipeirotis - Proceedings of the 14th ACM …, 2008 - dl.acm.org
This paper addresses the repeated acquisition of labels for data items when the labeling is
imperfect. We examine the improvement (or lack thereof) in data quality via repeated …

Active learning: A survey

CC Aggarwal, X Kong, Q Gu, J Han, SY Philip - Data classification, 2014 - taylorfrancis.com
In all these cases, labels can be obtained, but only at a significant cost to the end user. An
important observation is that all records are not equally important from the perspective of …

Learning to maximize mutual information for dynamic feature selection

IC Covert, W Qiu, M Lu, NY Kim… - International …, 2023 - proceedings.mlr.press
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to
train models with static feature subsets. Here, we consider the dynamic feature selection …

Eddi: Efficient dynamic discovery of high-value information with partial vae

C Ma, S Tschiatschek, K Palla… - arxiv preprint arxiv …, 2018 - arxiv.org
Many real-life decision-making situations allow further relevant information to be acquired at
a specific cost, for example, in assessing the health status of a patient we may decide to take …

Creating diversity in ensembles using artificial data

P Melville, RJ Mooney - Information Fusion, 2005 - Elsevier
The diversity of an ensemble of classifiers is known to be an important factor in determining
its generalization error. We present a new method for generating ensembles, Decorate …

VAEM: a deep generative model for heterogeneous mixed type data

C Ma, S Tschiatschek, R Turner… - Advances in …, 2020 - proceedings.neurips.cc
Deep generative models often perform poorly in real-world applications due to the
heterogeneity of natural data sets. Heterogeneity arises from data containing different types …

Repeated labeling using multiple noisy labelers

PG Ipeirotis, F Provost, VS Sheng, J Wang - Data Mining and Knowledge …, 2014 - Springer
This paper addresses the repeated acquisition of labels for data items when the labeling is
imperfect. We examine the improvement (or lack thereof) in data quality via repeated …

Bayesian co-training

S Yu, B Krishnapuram, H Steck… - Advances in neural …, 2007 - proceedings.neurips.cc
We propose a Bayesian undirected graphical model for co-training, or more generally for
semi-supervised multi-view learning. This makes explicit the previously unstated …

Active learning: an empirical study of common baselines

ME Ramirez-Loaiza, M Sharma, G Kumar… - Data mining and …, 2017 - Springer
Most of the empirical evaluations of active learning approaches in the literature have
focused on a single classifier and a single performance measure. We present an extensive …