A survey of clustering algorithms for big data: Taxonomy and empirical analysis

A Fahad, N Alshatri, Z Tari, A Alamri… - IEEE transactions on …, 2014 - ieeexplore.ieee.org
Clustering algorithms have emerged as an alternative powerful meta-learning tool to
accurately analyze the massive volume of data generated by modern applications. In …

Performance evaluation of some clustering algorithms and validity indices

U Maulik, S Bandyopadhyay - IEEE Transactions on pattern …, 2002 - ieeexplore.ieee.org
In this article, we evaluate the performance of three clustering algorithms, hard K-Means,
single linkage, and a simulated annealing (SA) based technique, in conjunction with four …

An empirical comparison of four initialization methods for the k-means algorithm

JM Pena, JA Lozano, P Larranaga - Pattern recognition letters, 1999 - Elsevier
In this paper, we aim to compare empirically four initialization methods for the K-Means
algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its …

[KNIHA][B] Constrained clustering: Advances in algorithms, theory, and applications

S Basu, I Davidson, K Wagstaff - 2008 - taylorfrancis.com
This volume encompasses many new types of constraints and clustering methods as well as
delivers thorough coverage of the capabilities and limitations of constrained clustering. With …

The effectiveness of Lloyd-type methods for the k-means problem

R Ostrovsky, Y Rabani, LJ Schulman… - Journal of the ACM …, 2013 - dl.acm.org
We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt
to explain its popularity (a half century after its introduction) among practitioners, and in …

Pulse: Mining customer opinions from free text

M Gamon, A Aue, S Corston-Oliver… - Advances in Intelligent …, 2005 - Springer
We present a prototype system, code-named Pulse, for mining topics and sentiment
orientation jointly from free text customer feedback. We describe the application of the …

Correlation clustering in general weighted graphs

ED Demaine, D Emanuel, A Fiat, N Immorlica - Theoretical Computer …, 2006 - Elsevier
We consider the following general correlation-clustering problem [N. Bansal, A. Blum, S.
Chawla, Correlation clustering, in: Proc. 43rd Annu. IEEE Symp. on Foundations of …

[PDF][PDF] A unified framework for model-based clustering

S Zhong, J Ghosh - The Journal of Machine Learning Research, 2003 - jmlr.org
Abstract Model-based clustering techniques have been widely used and have shown
promising results in many applications involving complex data. This paper presents a unified …

An experimental comparison of model-based clustering methods

M Meilă, D Heckerman - Machine learning, 2001 - Springer
We compare the three basic algorithms for model-based clustering on high-dimensional
discrete-variable datasets. All three algorithms use the same underlying model: a naive …

Adaptive dimension reduction for clustering high dimensional data

C Ding, X He, H Zha, HD Simon - 2002 IEEE International …, 2002 - ieeexplore.ieee.org
It is well-known that for high dimensional data clustering, standard algorithms such as EM
and K-means are often trapped in a local minimum. Many initialization methods have been …