Survey of state-of-the-art mixed data clustering algorithms
Mixed data comprises both numeric and categorical features, and mixed datasets occur
frequently in many domains, such as health, finance, and marketing. Clustering is often …
frequently in many domains, such as health, finance, and marketing. Clustering is often …
Skinny-dip: clustering in a sea of noise
S Maurus, C Plant - Proceedings of the 22nd ACM SIGKDD international …, 2016 - dl.acm.org
Can we find heterogeneous clusters hidden in data sets with 80% noise? Although such
settings occur in the real-world, we struggle to find methods from the abundance of …
settings occur in the real-world, we struggle to find methods from the abundance of …
Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study
Determining the right code reviewer for a given code change requires understanding the
characteristics of the changed code, identifying the skills of each potential reviewer …
characteristics of the changed code, identifying the skills of each potential reviewer …
Towards an optimal subspace for k-means
Is there an optimal dimensionality reduction for k-means, revealing the prominent cluster
structure hidden in the data? We propose SUBKMEANS, which extends the classic k-means …
structure hidden in the data? We propose SUBKMEANS, which extends the classic k-means …
Non-redundant subspace clusterings with nr-kmeans and nr-dipmeans
A huge object collection in high-dimensional space can often be clustered in more than one
way, for instance, objects could be clustered by their shape or alternatively by their color …
way, for instance, objects could be clustered by their shape or alternatively by their color …
Density-based multiscale analysis for clustering in strong noise settings with varying densities
Finding meaningful clustering patterns in data can be very challenging when the clusters are
of arbitrary shapes, different sizes, or densities, and especially when the data set contains …
of arbitrary shapes, different sizes, or densities, and especially when the data set contains …
[PDF][PDF] Details (Don't) Matter: Isolating Cluster Information in Deep Embedded Spaces.
Deep clustering techniques combine representation learning with clustering objectives to
improve their performance. Among existing deep clustering techniques, autoencoder-based …
improve their performance. Among existing deep clustering techniques, autoencoder-based …
Enhancing cluster analysis via topological manifold learning
M Herrmann, D Kazempour, F Scheipl… - Data Mining and …, 2024 - Springer
We discuss topological aspects of cluster analysis and show that inferring the topological
structure of a dataset before clustering it can considerably enhance cluster detection: we …
structure of a dataset before clustering it can considerably enhance cluster detection: we …
[PDF][PDF] Large-scale subspace clustering by fast regression coding
Abstract Large-Scale Subspace Clustering (LSSC) is an interesting and important problem
in big data era. However, most existing methods (ie, sparse or low-rank subspace clustering) …
in big data era. However, most existing methods (ie, sparse or low-rank subspace clustering) …
Extension of the Dip-test Repertoire-Efficient and Differentiable p-value Calculation for Clustering
Over the last decade, the Dip-test of unimodality has gained increasing interest in the data
mining community as it is a parameter-free statistical test that reliably rates the modality in …
mining community as it is a parameter-free statistical test that reliably rates the modality in …