An overview of overfitting and its solutions

X Ying - Journal of physics: Conference series, 2019 - iopscience.iop.org
Overfitting is a fundamental issue in supervised machine learning which prevents us from
perfectly generalizing the models to well fit observed data on training data, as well as …

Machine learning approaches in microbiome research: challenges and best practices

G Papoutsoglou, S Tarazona, MB Lopes… - Frontiers in …, 2023 - frontiersin.org
Microbiome data predictive analysis within a machine learning (ML) workflow presents
numerous domain-specific challenges involving preprocessing, feature selection, predictive …

Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it

MJ Denny, A Spirling - Political analysis, 2018 - cambridge.org
Despite the popularity of unsupervised techniques for political science text-as-data research,
the importance and implications of preprocessing decisions in this domain have received …

[SÁCH][B] Evaluating learning algorithms: a classification perspective

N Japkowicz, M Shah - 2011 - books.google.com
The field of machine learning has matured to the point where many sophisticated learning
approaches can be applied to practical applications. Thus it is of critical importance that …

[SÁCH][B] Data Science for Business: What you need to know about data mining and data-analytic thinking

F Provost, T Fawcett - 2013 - books.google.com
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for
Business introduces the fundamental principles of data science, and walks you through the" …

Unbiased recursive partitioning: A conditional inference framework

T Hothorn, K Hornik, A Zeileis - Journal of Computational and …, 2006 - Taylor & Francis
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental
problems of exhaustive search procedures usually applied to fit such models have been …

Efficient feature selection via analysis of relevance and redundancy

L Yu, H Liu - Journal of machine learning research, 2004 - jmlr.org
Feature selection is applied to reduce the number of features in many applications where
data has hundreds or thousands of features. Existing feature selection methods mainly focus …

[SÁCH][B] Business intelligence: data mining and optimization for decision making

C Vercellis - 2011 - books.google.com
Business intelligence is a broad category of applications and technologies for gathering,
providing access to, and analyzing data for the purpose of hel** enterprise users make …

A reality check for data snoo**

H White - Econometrica, 2000 - Wiley Online Library
Data snoo** occurs when a given set of data is used more than once for purposes of
inference or model selection. When such data reuse occurs, there is always the possibility …

Learning when training data are costly: The effect of class distribution on tree induction

GM Weiss, F Provost - Journal of artificial intelligence research, 2003 - jair.org
For large, real-world inductive learning problems, the number of training examples often
must be limited due to the costs associated with procuring, preparing, and storing the …