Anomaly detection: A survey

V Chandola, A Banerjee, V Kumar - ACM computing surveys (CSUR), 2009 - dl.acm.org
Anomaly detection is an important problem that has been researched within diverse
research areas and application domains. Many anomaly detection techniques have been …

Subspace clustering for high dimensional data: a review

L Parsons, E Haque, H Liu - Acm sigkdd explorations newsletter, 2004 - dl.acm.org
Subspace clustering is an extension of traditional clustering that seeks to find clusters in
different subspaces within a dataset. Often in high dimensional data, many dimensions are …

Research on K-value selection method of K-means clustering algorithm

C Yuan, H Yang - J, 2019 - mdpi.com
Among many clustering algorithms, the K-means clustering algorithm is widely used
because of its simple algorithm and fast convergence. However, the K-value of clustering …

BIRCH: an efficient data clustering method for very large databases

T Zhang, R Ramakrishnan, M Livny - ACM sigmod record, 1996 - dl.acm.org
Finding useful patterns in large datasets has attracted considerable interest recently, and
one of the most widely studied problems in this area is the identification of clusters, or …

A survey of clustering data mining techniques

P Berkhin - Grou** multidimensional data: Recent advances in …, 2006 - Springer
Clustering is the division of data into groups of similar objects. In clustering, some details are
disregarded in exchange for data simplification. Clustering can be viewed as a data …

[BOOK][B] Data clustering: theory, algorithms, and applications

G Gan, C Ma, J Wu - 2020 - SIAM
The monograph Data Clustering: Theory, Algorithms, and Applications was published in
2007. Starting with the common ground and knowledge for data clustering, the monograph …

[BOOK][B] The data matching process

P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …

Clustering huge protein sequence sets in linear time

M Steinegger, J Söding - Nature communications, 2018 - nature.com
Metagenomic datasets contain billions of protein sequences that could greatly enhance
large-scale functional annotation and structure prediction. Utilizing this enormous resource …

Duplicate record detection: A survey

AK Elmagarmid, PG Ipeirotis… - IEEE Transactions on …, 2006 - ieeexplore.ieee.org
Often, in the real world, entities have two or more representations in databases. Duplicate
records do not share a common key and/or they contain errors that make duplicate matching …

A survey of techniques for event detection in twitter

F Atefeh, W Khreich - Computational Intelligence, 2015 - Wiley Online Library
Twitter is among the fastest‐growing microblogging and online social networking services.
Messages posted on Twitter (tweets) have been reporting everything from daily life stories to …