Automatic and precise data validation for machine learning

S Shankar, L Fawaz, K Gyllstrom… - Proceedings of the 32nd …, 2023 - dl.acm.org
Machine learning (ML) models in production pipelines are frequently retrained on the latest
partitions of large, continually-growing datasets. Due to engineering bugs, partitions in such …

Metricizing the Euclidean space towards desired distance relations in point clouds

S Rass, S König, S Ahmad… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
We introduce the concept of an-semimetric that satisfies the same axioms as a topological
metric, except for an arbitrarily small allowance to violate the triangle inequality. Under this …

A comprehensive survey of fast graph clustering

J Xue, L **ng, Y Wang, X Fan, L Kong, Q Zhang, F Nie… - Vicinagearth, 2024 - Springer
Graph clustering methods are popular due to their ability to discover clusters with arbitrary
shapes. However, with the emergence of large-scale datasets, the efficiency of graph …

Data with density-based clusters: A generator for systematic evaluation of clustering algorithms

P Jahn, CMM Frey, A Beer, C Leiber, T Seidl - Joint European Conference …, 2024 - Springer
Mining data containing density-based clusters is well-established and widespread but faces
problems when it comes to systematic and reproducible comparison and evaluation …

Ensemble Clustering based on Meta-Learning and Hyperparameter Optimization

D Treder-Tschechlov, M Fritz, H Schwarz… - Proceedings of the …, 2024 - dl.acm.org
Efficient clustering algorithms, such as k-Means, are often used in practice because they
scale well for large datasets. However, they are only able to detect simple data …

Moving fast with broken data

S Shankar, L Fawaz, K Gyllstrom… - arxiv preprint arxiv …, 2023 - arxiv.org
Machine learning (ML) models in production pipelines are frequently retrained on the latest
partitions of large, continually-growing datasets. Due to engineering bugs, partitions in such …

Direct Spectral Clustering with New Graph Learning for Better Fitting

L Kong, J Xue, F Nie, X Li - IEEE Transactions on Knowledge …, 2025 - ieeexplore.ieee.org
Traditional spectral clustering methods struggle with scalability and robustness in large
datasets due to their reliance on similarity matrices and eigenvalue decomposition. We …

A method framework of cruciate ligaments segmentation and reconstruction from MRI images

A Humayun, B Liu, M Rehman… - … and Health Care, 2025 - journals.sagepub.com
<? show [AQ ID= GQ2 POS=-12pt]?><? show [AQ ID= GQ5 POS= 24pt]?> Segmenting
anterior and posterior cruciate ligaments (ACL/PCL) presents challenges in medical imaging …

I Want'Em All (At Once)--Ultrametric Cluster Hierarchies

A Draganov, P Weber, RSM Jørgensen, A Beer… - arxiv preprint arxiv …, 2025 - arxiv.org
Hierarchical clustering is a powerful tool for exploratory data analysis, organizing data into a
tree of clusterings from which a partition can be chosen. This paper generalizes these ideas …

Learning from complex networks

CMM Frey - 2023 - edoc.ub.uni-muenchen.de
Graph Theory has proven to be a universal language for describing modern complex
systems. The elegant theoretical framework of graphs drew the researchers' attention over …