Computational cluster validation in post-genomic data analysis

J Handl, J Knowles, DB Kell - Bioinformatics, 2005 - academic.oup.com
Motivation The discovery of novel biological knowledge from the ab initio analysis of post-
genomic data relies upon the use of unsupervised processing methods, in particular …

A roadmap of clustering algorithms: finding a match for a biomedical application

B Andreopoulos, A An, X Wang… - Briefings in …, 2009 - academic.oup.com
Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means
partitioning being the most popular methods. Numerous improvements of these two …

High-throughput genome scaffolding from in vivo DNA interaction frequency

N Kaplan, J Dekker - Nature biotechnology, 2013 - nature.com
Despite advances in DNA sequencing technology, assembly of complex genomes remains
a major challenge, particularly for genomes sequenced using short reads, which yield highly …

Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

Y Loewenstein, E Portugaly, M Fromer, M Linial - Bioinformatics, 2008 - academic.oup.com
Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical
data clustering, especially in computational biology. However, UPGMA requires the entire …

Three invariant Hi-C interaction patterns: applications to genome assembly

S Oddes, A Zelig, N Kaplan - Methods, 2018 - Elsevier
Assembly of reference-quality genomes from next-generation sequencing data is a key
challenge in genomics. Recently, we and others have shown that Hi-C data can be used to …

A generalized enhanced quantum fuzzy approach for efficient data clustering

N Bharill, OP Patel, A Tiwari, L Mu, DL Li… - IEEE …, 2019 - ieeexplore.ieee.org
Data clustering is a challenging task to gain insights into data in various fields. In this paper,
an Enhanced Quantum-Inspired Evolutionary Fuzzy C-Means (EQIE-FCM) algorithm is …

Functional annotation prediction: all for one and one for all

O Sasson, N Kaplan, M Linial - Protein Science, 2006 - Wiley Online Library
In an era of rapid genome sequencing and high‐throughput technology, automatic function
prediction for a novel sequence is of utter importance in bioinformatics. While automatic …

EVEREST: automatic identification and classification of protein domains in all protein sequences

E Portugaly, A Harel, N Linial, M Linial - BMC bioinformatics, 2006 - Springer
Background Proteins are comprised of one or several building blocks, known as domains.
Such domains can be classified into families according to their evolutionary origin. Whereas …

Model order selection for bio-molecular data clustering

A Bertoni, G Valentini - BMC bioinformatics, 2007 - Springer
Background Cluster analysis has been widely applied for investigating structure in bio-
molecular data. A drawback of most clustering algorithms is that they cannot automatically …

Gene cluster statistics with gene families

N Raghupathy, D Durand - Molecular biology and evolution, 2009 - academic.oup.com
Identifying genomic regions that descended from a common ancestor is important for
understanding the function and evolution of genomes. In distantly related genomes, clusters …