Categorical data clustering: 25 years beyond K-modes

T Dinh, W Hauchi, P Fournier-Viger, D Lisik… - Expert Systems with …, 2025 - Elsevier
The clustering of categorical data is a common and important task in computer science,
offering profound implications across a spectrum of applications. Unlike purely numerical …

Categorical data clustering: A bibliometric analysis and taxonomy

M Cendana, RJ Kuo - Machine Learning and Knowledge Extraction, 2024 - mdpi.com
Numerous real-world applications apply categorical data clustering to find hidden patterns in
the data. The K-modes-based algorithm is a popular algorithm for solving common issues in …

A multi-view kernel clustering framework for categorical sequences

K Xu, L Chen, S Wang - Expert Systems with Applications, 2022 - Elsevier
Multi-view clustering, which optimally integrates complementary information from different
views to improve clustering performance, has drawn considerable attention in recent years …

A generalized multi-aspect distance metric for mixed-type data clustering

E Mousavi, M Sehhati - Pattern Recognition, 2023 - Elsevier
Distance calculation is straightforward when working with pure categorical or pure numerical
data sets. Defining a unified distance to improve the clustering performance for a mixed data …

Ip2vec: Learning similarities between ip addresses

M Ring, A Dallmann, D Landes… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
IP Addresses are a central part of packet-and flow-based network data. However,
visualization and similarity computation of IP Addresses are challenging to due the missing …

Subspace clustering of categorical and numerical data with an unknown number of clusters

H Jia, YM Cheung - IEEE transactions on neural networks and …, 2017 - ieeexplore.ieee.org
In clustering analysis, data attributes may have different contributions to the detection of
various clusters. To solve this problem, the subspace clustering technique has been …

Self-adaptive multiprototype-based competitive learning approach: A k-means-type algorithm for imbalanced data clustering

Y Lu, YM Cheung, YY Tang - IEEE transactions on cybernetics, 2019 - ieeexplore.ieee.org
Class imbalance problem has been extensively studied in the recent years, but imbalanced
data clustering in unsupervised environment, that is, the number of samples among clusters …

Graph-based dissimilarity measurement for cluster analysis of any-type-attributed data

Y Zhang, YM Cheung - IEEE transactions on neural networks …, 2022 - ieeexplore.ieee.org
Heterogeneous attribute data composed of attributes with different types of values are quite
common in a variety of real-world applications. As data annotation is usually expensive …

QGRL: quaternion graph representation learning for heterogeneous feature data clustering

J Chen, Y Ji, R Zou, Y Zhang, Y Cheung - Proceedings of the 30th ACM …, 2024 - dl.acm.org
Clustering is one of the most commonly used techniques for unsupervised data analysis. As
real data sets are usually composed of numerical and categorical features that are …

SIGMM: A novel machine learning algorithm for spammer identification in industrial mobile cloud computing

T Qiu, H Wang, K Li, H Ning… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
An industrial mobile network is crucial for industrial production in the Internet of Things. It
guarantees the normal function of machines and the normalization of industrial production …