Transforming variables to central normality

J Raymaekers, PJ Rousseeuw - Machine Learning, 2024 - Springer
Many real data sets contain numerical features (variables) whose distribution is far from
normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it …

The cellwise minimum covariance determinant estimator

J Raymaekers, PJ Rousseeuw - Journal of the American Statistical …, 2024 - Taylor & Francis
Abstract The usual Minimum Covariance Determinant (MCD) estimator of a covariance
matrix is robust against casewise outliers. These are cases (that is, rows of the data matrix) …

GenerativeMTD: A deep synthetic data generation framework for small datasets

J Sivakumar, K Ramamurthy, M Radhakrishnan… - Knowledge-Based …, 2023 - Elsevier
Synthetic data generation for tabular data unlike computer vision, is an emerging challenge.
When tabular data needs to be synthesized, it either faces a small dataset problem or …

The R Package Ecosystem for Robust Statistics

V Todorov - Wiley Interdisciplinary Reviews: Computational …, 2024 - Wiley Online Library
In the last few years, the number of R packages implementing different robust statistical
methods have increased substantially. There are now numerous packages for computing …

Robust discriminant analysis

M Hubert, J Raymaekers… - Wiley Interdisciplinary …, 2024 - Wiley Online Library
Discriminant analysis (DA) is one of the most popular methods for classification due to its
conceptual simplicity, low computational cost, and often solid performance. In its standard …

[HTML][HTML] Challenges of cellwise outliers

J Raymaekers, PJ Rousseeuw - Econometrics and Statistics, 2024 - Elsevier
It is well-known that real data often contain outliers. The term outlier usually refers to a case,
usually denoted by a row of the n× d data matrix. In recent times a different type has come …

Fast robust correlation for high-dimensional data

J Raymaekers, PJ Rousseeuw - Technometrics, 2021 - Taylor & Francis
The product moment covariance matrix is a cornerstone of multivariate data analysis, from
which one can derive correlations, principal components, Mahalanobis distances and many …

[HTML][HTML] MacroPCA: An all-in-one PCA method allowing for missing values as well as cellwise and rowwise outliers

M Hubert, PJ Rousseeuw, W Van den Bossche - Technometrics, 2019 - Taylor & Francis
Multivariate data are typically represented by a rectangular matrix (table) in which the rows
are the objects (cases) and the columns are the variables (measurements). When there are …

Multivariate outlier detection in applied data analysis: global, local, compositional and Cellwise outliers

P Filzmoser, M Gregorich - Mathematical Geosciences, 2020 - Springer
Outliers are encountered in all practical situations of data analysis, regardless of the
discipline of application. However, the term outlier is not uniformly defined across all these …

Noise simulation in classification with the noisemodel R package: Applications analyzing the impact of errors with chemical data

JA Sáez - Journal of Chemometrics, 2023 - Wiley Online Library
Classification datasets created from chemical processes can be affected by errors, which
impair the accuracy of the models built. This fact highlights the importance of analyzing the …