Systematic review of clustering high-dimensional and large datasets

D Pandove, S Goel, R Rani - … on Knowledge Discovery from Data (TKDD …, 2018 - dl.acm.org
Technological advancement has enabled us to store and process huge amount of data in
relatively short spans of time. The nature of data is rapidly changing, particularly its …

Hierarchical agglomerative graph clustering in nearly-linear time

L Dhulipala, D Eisenstat, J Łącki… - … on machine learning, 2021 - proceedings.mlr.press
We study the widely-used hierarchical agglomerative clustering (HAC) algorithm on edge-
weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph …

Hierarchical agglomerative graph clustering in poly-logarithmic depth

L Dhulipala, D Eisenstat, J Lacki… - Advances in Neural …, 2022 - proceedings.neurips.cc
Obtaining scalable algorithms for\emph {hierarchical agglomerative clustering}(HAC) is of
significant interest due to the massive size of real-world datasets. At the same time …

Subquadratic high-dimensional hierarchical clustering

A Abboud, V Cohen-Addad… - Advances in Neural …, 2019 - proceedings.neurips.cc
We consider the widely-used average-linkage, single-linkage, and Ward's methods for
computing hierarchical clusterings of high-dimensional Euclidean inputs. It is easy to show …

Objective-based hierarchical clustering of deep embedding vectors

S Naumov, G Yaroslavtsev, D Avdiukhin - Proceedings of the AAAI …, 2021 - ojs.aaai.org
We initiate a comprehensive experimental study of objective-based hierarchical clustering
methods on massive datasets consisting of deep embedding vectors from computer vision …

PSEUDo: Interactive pattern search in multivariate time series with locality-sensitive hashing and relevance feedback

Y Yu, D Kruyff, J Jiao, T Becker… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
We present PSEUDo, a visual pattern retrieval tool for multivariate time series. It aims to
overcome the uneconomic (re-) training problem accompanying deep learning-based …

Parchain: A framework for parallel hierarchical agglomerative clustering using nearest-neighbor chain

S Yu, Y Wang, Y Gu, L Dhulipala, J Shun - arxiv preprint arxiv:2106.04727, 2021 - arxiv.org
This paper studies the hierarchical clustering problem, where the goal is to produce a
dendrogram that represents clusters at varying scales of a data set. We propose the …

Trigonometric words ranking model for spam message classification

SM Hadi, AH Alsaeedi, D Al‐Shammary… - IET …, 2022 - Wiley Online Library
The significant increase in the volume of fake (spam) messages has led to an urgent need to
develop and implement a robust anti‐spam method. Several of the current anti‐spam …

User profiling in elderly healthcare services in China: Scalper detection

C **e, H Cai, Y Yang, L Jiang… - IEEE journal of biomedical …, 2018 - ieeexplore.ieee.org
Driven by the automation technologies and health informatics of Industry 4.0, hospitals in
China have deployed a complete automation system/platform for healthcare services …

Fitting metrics and ultrametrics with minimum disagreements

V Cohen-Addad, C Fan, E Lee, A De Mesmay - SIAM Journal on Computing, 2025 - SIAM
Given recording pairwise distances, the Metric Violation Distance problem asks to compute
the distance between and the metric cone; ie, modify the minimum number of entries of to …