A comprehensive survey of anomaly detection techniques for high dimensional big data

S Thudumu, P Branch, J **, J Singh - Journal of big data, 2020 - Springer
Anomaly detection in high dimensional data is becoming a fundamental research problem
that has various applications in the real world. However, many existing anomaly detection …

A high-bias, low-variance introduction to machine learning for physicists

P Mehta, M Bukov, CH Wang, AGR Day, C Richardson… - Physics reports, 2019 - Elsevier
Abstract Machine Learning (ML) is one of the most exciting and dynamic areas of modern
research and application. The purpose of this review is to provide an introduction to the core …

An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets

G Kovács - Applied soft computing, 2019 - Elsevier
Learning and mining from imbalanced datasets gained increased interest in recent years.
One simple but efficient way to increase the performance of standard machine learning …

A survey on multiview clustering

G Chao, S Sun, J Bi - IEEE transactions on artificial intelligence, 2021 - ieeexplore.ieee.org
Clustering is a machine learning paradigm of dividing sample subjects into a number of
groups such that subjects in the same groups are more similar to those in other groups. With …

A comprehensive survey of clustering algorithms

D Xu, Y Tian - Annals of data science, 2015 - Springer
Data analysis is used as a common method in modern science research, which is across
communication science, computer science and biology science. Clustering, as the basic …

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

GO Campos, A Zimek, J Sander… - Data mining and …, 2016 - Springer
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data
mining research. Little is known regarding the strengths and weaknesses of different …

Comparison of clustering methods for high‐dimensional single‐cell flow and mass cytometry data

LM Weber, MD Robinson - Cytometry Part A, 2016 - Wiley Online Library
Recent technological developments in high‐dimensional flow cytometry and mass cytometry
(CyTOF) have made it possible to detect expression levels of dozens of protein markers in …

HTMD: high-throughput molecular dynamics for molecular discovery

S Doerr, MJ Harvey, F Noé… - Journal of chemical …, 2016 - ACS Publications
Recent advances in molecular simulations have allowed scientists to investigate slower
biological processes than ever before. Together with these advances came an explosion of …

Enhanced fuzzy clustering for incomplete instance with evidence combination

Z Liu, S Letchmunan - ACM Transactions on Knowledge Discovery from …, 2024 - dl.acm.org
Clustering incomplete instance is still a challenging task since missing values maybe make
the cluster information ambiguous, leading to the uncertainty and imprecision in results. This …

A survey on unsupervised outlier detection in high‐dimensional numerical data

A Zimek, E Schubert, HP Kriegel - Statistical Analysis and Data …, 2012 - Wiley Online Library
High‐dimensional data in Euclidean space pose special challenges to data mining
algorithms. These challenges are often indiscriminately subsumed under the term 'curse of …