An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data

L **g, MK Ng, JZ Huang - IEEE Transactions on knowledge …, 2007‏ - ieeexplore.ieee.org
This paper presents a new k-means type algorithm for clustering high-dimensional objects in
sub-spaces. In high-dimensional data, clusters of objects often exist in subspaces rather …

Enhanced soft subspace clustering integrating within-cluster and between-cluster information

Z Deng, KS Choi, FL Chung, S Wang - Pattern recognition, 2010‏ - Elsevier
While within-cluster information is commonly utilized in most soft subspace clustering
approaches in order to develop the algorithms, other important information such as between …

A feature group weighting method for subspace clustering of high-dimensional data

X Chen, Y Ye, X Xu, JZ Huang - Pattern Recognition, 2012‏ - Elsevier
This paper proposes a new method to weight subspaces in feature groups and individual
features for clustering high-dimensional data. In this method, the features of high …

Subspace clustering of categorical and numerical data with an unknown number of clusters

H Jia, YM Cheung - IEEE transactions on neural networks and …, 2017‏ - ieeexplore.ieee.org
In clustering analysis, data attributes may have different contributions to the detection of
various clusters. To solve this problem, the subspace clustering technique has been …

Extensions of kmeans-type algorithms: A new clustering framework by integrating intracluster compactness and intercluster separation

X Huang, Y Ye, H Zhang - IEEE transactions on neural …, 2013‏ - ieeexplore.ieee.org
Kmeans-type clustering aims at partitioning a data set into clusters such that the objects in a
cluster are compact and the objects in different clusters are well separated. However, most …

Improving authorship attribution: optimizing Burrows' Delta method

PWH Smith, W Aldridge - Journal of Quantitative Linguistics, 2011‏ - Taylor & Francis
Abstract Burrows' Delta Method (Burrows,) is a leading method of authorship attribution. It
can be used to shortlist potential authors from a list or to even identify potential authors. The …

Score-based likelihood ratios for linguistic text evidence with a bag-of-words model

S Ishihara - Forensic Science International, 2021‏ - Elsevier
The likelihood ratio paradigm for quantifying the strength of evidence has been researched
in many fields of forensic science. Within this paradigm, score-based approaches for …

Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm

L **g, MK Ng, J Xu, JZ Huang - … in Knowledge Discovery and Data Mining …, 2005‏ - Springer
This paper presents a new method to solve the problem of clustering large and complex text
data. The method is based on a new subspace clustering algorithm that automatically …

DSKmeans: a new kmeans-type approach to discriminative subspace clustering

X Huang, Y Ye, H Guo, Y Cai, H Zhang, Y Li - Knowledge-Based Systems, 2014‏ - Elsevier
Most of kmeans-type clustering algorithms rely on only intra-cluster compactness, ie the
dispersions of a cluster. Inter-cluster separation which is widely used in classification …

On the use of side information for mining text data

CC Aggarwal, Y Zhao, SY Philip - IEEE Transactions on …, 2012‏ - ieeexplore.ieee.org
In many text mining applications, side-information is available along with the text documents.
Such side-information may be of different kinds, such as document provenance information …