Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era

R Rizzi, S Beretta, M Patterson, Y Pirola, M Previtali… - Quantitative …, 2019 - Springer
Background De novo genome assembly relies on two kinds of graphs: de Bruijn graphs and
overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn …

[HTML][HTML] Inducing enhanced suffix arrays for string collections

FA Louza, S Gog, GP Telles - Theoretical Computer Science, 2017 - Elsevier
Constructing the suffix array for a string collection is an important task that may be performed
by sorting the concatenation of all strings. In this article we present algorithms g SAIS and g …

External memory BWT and LCP computation for sequence collections with applications

L Egidi, FA Louza, G Manzini, GP Telles - Algorithms for Molecular Biology, 2019 - Springer
Background Sequencing technologies produce larger and larger collections of
biosequences that have to be stored in compressed indices supporting fast search …

A novel fast multiple nucleotide sequence alignment method based on FM-index

H Liu, Q Zou, Y Xu - Briefings in Bioinformatics, 2022 - academic.oup.com
Multiple sequence alignment (MSA) is fundamental to many biological applications. But
most classical MSA algorithms are difficult to handle large-scale multiple sequences …

phyBWT2: phylogeny reconstruction via eBWT positional clustering

V Guerrini, A Conte, R Grossi, G Liti, G Rosone… - Algorithms for Molecular …, 2023 - Springer
Background Molecular phylogenetics studies the evolutionary relationships among the
individuals of a population through their biological sequences. It may provide insights about …

Variable-order reference-free variant discovery with the Burrows-Wheeler Transform

N Prezza, N Pisanti, M Sciortino, G Rosone - BMC bioinformatics, 2020 - Springer
Abstract Background In [Prezza et al., AMB 2019], a new reference-free and alignment-free
framework for the detection of SNPs was suggested and tested. The framework, based on …

SNPs detection by eBWT positional clustering

N Prezza, N Pisanti, M Sciortino, G Rosone - Algorithms for Molecular …, 2019 - Springer
Background Sequencing technologies keep on turning cheaper and faster, thus putting a
growing pressure for data structures designed to efficiently store raw data, and possibly …

Generalized enhanced suffix array construction in external memory

FA Louza, GP Telles, S Hoffmann… - Algorithms for Molecular …, 2017 - Springer
Background Suffix arrays, augmented by additional data structures, allow solving efficiently
many string processing problems. The external memory construction of the generalized …

Multithread multistring Burrows–Wheeler transform and longest common prefix array

P Bonizzoni, G Della Vedova, Y Pirola… - Journal of …, 2019 - liebertpub.com
Indexing huge collections of strings, such as those produced by the widespread sequencing
technologies, heavily relies on multistring generalizations of the Burrows–Wheeler transform …

Space-efficient computation of the LCP array from the Burrows-Wheeler transform

N Prezza, G Rosone - arxiv preprint arxiv:1901.05226, 2019 - arxiv.org
We show that the Longest Common Prefix Array of a text collection of total size n on
alphabet [1,{\sigma}] can be computed from the Burrows-Wheeler transformed collection in …