Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Fully functional suffix trees and optimal text searching in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
A survey of BWT variants for string collections
D Cenzato, Z Lipták - Bioinformatics, 2024 - academic.oup.com
Motivation In recent years, the focus of bioinformatics research has moved from individual
sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler …
sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler …
Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space
The last two decades have witnessed a dramatic increase in the amount of highly repetitive
datasets consisting of sequential data (strings, texts). Processing these massive amounts of …
datasets consisting of sequential data (strings, texts). Processing these massive amounts of …
Dynamic suffix array with polylogarithmic queries and updates
The suffix array SA [1.. n] of a text T of length n is a permutation of {1,…, n} describing the
lexicographical ordering of suffixes of T and is considered to be one of the most important …
lexicographical ordering of suffixes of T and is considered to be one of the most important …
An upper bound and linear-space queries on the LZ-End parsing
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
Internal pattern matching queries in a text and applications
We consider several types of internal queries, that is, questions about fragments of a given
text specified in constant space by their locations in. Our main result is an optimal data …
text specified in constant space by their locations in. Our main result is an optimal data …
Faster approximate pattern matching: A unified approach
In the approximate pattern matching problem, given a text T, a pattern P, and a threshold k,
the task is to find (the starting positions of) all substrings of T that are at distance at most k …
the task is to find (the starting positions of) all substrings of T that are at distance at most k …
Quantum Speed-Ups for String Synchronizing Sets, Longest Common Substring, and k-mismatch Matching
Longest common substring (LCS) is an important text processing problem, which has
recently been investigated in the quantum query model. The decision version of this …
recently been investigated in the quantum query model. The decision version of this …
Computing MEMs and Relatives on Repetitive Text Collections
G Navarro - arxiv preprint arxiv:2210.09914, 2022 - arxiv.org
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …