A comprehensive survey of clustering algorithms

D Xu, Y Tian - Annals of data science, 2015 - Springer
Data analysis is used as a common method in modern science research, which is across
communication science, computer science and biology science. Clustering, as the basic …

A high-bias, low-variance introduction to machine learning for physicists

P Mehta, M Bukov, CH Wang, AGR Day, C Richardson… - Physics reports, 2019 - Elsevier
Abstract Machine Learning (ML) is one of the most exciting and dynamic areas of modern
research and application. The purpose of this review is to provide an introduction to the core …

A comprehensive survey of anomaly detection techniques for high dimensional big data

S Thudumu, P Branch, J **, J Singh - Journal of Big Data, 2020 - Springer
Anomaly detection in high dimensional data is becoming a fundamental research problem
that has various applications in the real world. However, many existing anomaly detection …

[BOOK][B] Data mining: concepts and techniques

J Han, J Pei, H Tong - 2022 - books.google.com
Data Mining: Concepts and Techniques, Fourth Edition introduces concepts, principles, and
methods for mining patterns, knowledge, and models from various kinds of data for diverse …

A survey on multiview clustering

G Chao, S Sun, J Bi - IEEE transactions on artificial intelligence, 2021 - ieeexplore.ieee.org
Clustering is a machine learning paradigm of dividing sample subjects into a number of
groups such that subjects in the same groups are more similar to those in other groups. With …

[BOOK][B] Data clustering: theory, algorithms, and applications

G Gan, C Ma, J Wu - 2020 - SIAM
The monograph Data Clustering: Theory, Algorithms, and Applications was published in
2007. Starting with the common ground and knowledge for data clustering, the monograph …

Density‐based clustering

HP Kriegel, P Kröger, J Sander… - … reviews: data mining and …, 2011 - Wiley Online Library
Clustering refers to the task of identifying groups or clusters in a data set. In density‐based
clustering, a cluster is a set of data objects spread in the data space over a contiguous …

A survey on unsupervised outlier detection in high‐dimensional numerical data

A Zimek, E Schubert, HP Kriegel - Statistical Analysis and Data …, 2012 - Wiley Online Library
High‐dimensional data in Euclidean space pose special challenges to data mining
algorithms. These challenges are often indiscriminately subsumed under the term 'curse of …

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

GO Campos, A Zimek, J Sander… - Data mining and …, 2016 - Springer
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data
mining research. Little is known regarding the strengths and weaknesses of different …

An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets

G Kovács - Applied Soft Computing, 2019 - Elsevier
Learning and mining from imbalanced datasets gained increased interest in recent years.
One simple but efficient way to increase the performance of standard machine learning …