Avoiding inferior clusterings with misspecified Gaussian mixture models

SR Kasa, V Rajan - Scientific Reports, 2023 - nature.com
Clustering is a fundamental tool for exploratory data analysis, and is ubiquitous across
scientific disciplines. Gaussian Mixture Model (GMM) is a popular probabilistic and …

Parsimonious mixtures of multivariate contaminated normal distributions

A Punzo, PD McNicholas - Biometrical Journal, 2016 - Wiley Online Library
A mixture of multivariate contaminated normal distributions is developed for model‐based
clustering. In addition to the parameters of the classical normal mixture, our contaminated …

Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering

P Coretto, C Hennig - Journal of the American Statistical …, 2016 - Taylor & Francis
The two main topics of this article are the introduction of the “optimally tuned robust improper
maximum likelihood estimator”(OTRIMLE) for robust clustering based on the multivariate …

A multivariate hidden Markov model for the identification of sea regimes from incomplete skewed and circular time series

J Bulla, F Lagona, A Maruotti, M Picone - Journal of Agricultural, Biological …, 2012 - Springer
The identification of sea regimes from environmental multivariate times series is complicated
by the mixed linear–circular support of the data, by the occurrence of missing values, by the …

Addressing overfitting and underfitting in Gaussian model-based clustering

JL Andrews - Computational Statistics & Data Analysis, 2018 - Elsevier
The expectation–maximization (EM) algorithm is a common approach for parameter
estimation in the context of cluster analysis using finite mixture models. This approach …

A globally convergent algorithm for lasso-penalized mixture of linear regression models

LR Lloyd-Jones, HD Nguyen, GJ McLachlan - Computational Statistics & …, 2018 - Elsevier
Variable selection is an old and pervasive problem in regression analysis. One solution is to
impose a lasso penalty to shrink parameter estimates toward zero and perform continuous …

Anomaly and Novelty detection for robust semi-supervised learning

A Cappozzo, F Greselin, TB Murphy - Statistics and Computing, 2020 - Springer
Three important issues are often encountered in Supervised and Semi-Supervised
Classification: class memberships are unreliable for some training units (label noise), a …

A hidden Markov approach to the analysis of space–time environmental data with linear and circular components

F Lagona, M Picone, A Maruotti, S Cosoli - … Environmental Research and …, 2015 - Springer
The analysis of bivariate space–time series with linear and circular components is
complicated by (1) multiple correlations, across time, space and between variables,(2) …

Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering

P Coretto, C Hennig - Journal of Machine Learning Research, 2017 - jmlr.org
The robust improper maximum likelihood estimator (RIMLE) is a new method for robust
multivariate clustering finding approximately Gaussian clusters. It maximizes a …

A general hidden state random walk model for animal movement

A Nicosia, T Duchesne, LP Rivest, D Fortin - Computational Statistics & …, 2017 - Elsevier
A general hidden state random walk model is proposed to describe the movement of an
animal that takes into account movement taxis with respect to features of the environment. A …