Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era
Background De novo genome assembly relies on two kinds of graphs: de Bruijn graphs and
overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn …
overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn …
[HTML][HTML] Inducing enhanced suffix arrays for string collections
Constructing the suffix array for a string collection is an important task that may be performed
by sorting the concatenation of all strings. In this article we present algorithms g SAIS and g …
by sorting the concatenation of all strings. In this article we present algorithms g SAIS and g …
External memory BWT and LCP computation for sequence collections with applications
Background Sequencing technologies produce larger and larger collections of
biosequences that have to be stored in compressed indices supporting fast search …
biosequences that have to be stored in compressed indices supporting fast search …
A novel fast multiple nucleotide sequence alignment method based on FM-index
H Liu, Q Zou, Y Xu - Briefings in Bioinformatics, 2022 - academic.oup.com
Multiple sequence alignment (MSA) is fundamental to many biological applications. But
most classical MSA algorithms are difficult to handle large-scale multiple sequences …
most classical MSA algorithms are difficult to handle large-scale multiple sequences …
phyBWT2: phylogeny reconstruction via eBWT positional clustering
Background Molecular phylogenetics studies the evolutionary relationships among the
individuals of a population through their biological sequences. It may provide insights about …
individuals of a population through their biological sequences. It may provide insights about …
Variable-order reference-free variant discovery with the Burrows-Wheeler Transform
Abstract Background In [Prezza et al., AMB 2019], a new reference-free and alignment-free
framework for the detection of SNPs was suggested and tested. The framework, based on …
framework for the detection of SNPs was suggested and tested. The framework, based on …
SNPs detection by eBWT positional clustering
Background Sequencing technologies keep on turning cheaper and faster, thus putting a
growing pressure for data structures designed to efficiently store raw data, and possibly …
growing pressure for data structures designed to efficiently store raw data, and possibly …
Generalized enhanced suffix array construction in external memory
Background Suffix arrays, augmented by additional data structures, allow solving efficiently
many string processing problems. The external memory construction of the generalized …
many string processing problems. The external memory construction of the generalized …
Multithread multistring Burrows–Wheeler transform and longest common prefix array
Indexing huge collections of strings, such as those produced by the widespread sequencing
technologies, heavily relies on multistring generalizations of the Burrows–Wheeler transform …
technologies, heavily relies on multistring generalizations of the Burrows–Wheeler transform …
Space-efficient computation of the LCP array from the Burrows-Wheeler transform
N Prezza, G Rosone - arxiv preprint arxiv:1901.05226, 2019 - arxiv.org
We show that the Longest Common Prefix Array of a text collection of total size n on
alphabet [1,{\sigma}] can be computed from the Burrows-Wheeler transformed collection in …
alphabet [1,{\sigma}] can be computed from the Burrows-Wheeler transformed collection in …