Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

A survey of BWT variants for string collections

D Cenzato, Z Lipták - Bioinformatics, 2024 - academic.oup.com
Motivation In recent years, the focus of bioinformatics research has moved from individual
sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler …

Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space

D Kempa, T Kociumaka - 2023 IEEE 64th Annual Symposium …, 2023 - ieeexplore.ieee.org
The last two decades have witnessed a dramatic increase in the amount of highly repetitive
datasets consisting of sequential data (strings, texts). Processing these massive amounts of …

Dynamic suffix array with polylogarithmic queries and updates

D Kempa, T Kociumaka - Proceedings of the 54th Annual ACM SIGACT …, 2022 - dl.acm.org
The suffix array SA [1.. n] of a text T of length n is a permutation of {1,…, n} describing the
lexicographical ordering of suffixes of T and is considered to be one of the most important …

An upper bound and linear-space queries on the LZ-End parsing

D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …

Internal pattern matching queries in a text and applications

T Kociumaka, J Radoszewski, W Rytter, T Waleń - SIAM Journal on …, 2024 - SIAM
We consider several types of internal queries, that is, questions about fragments of a given
text specified in constant space by their locations in. Our main result is an optimal data …

Faster approximate pattern matching: A unified approach

P Charalampopoulos, T Kociumaka… - 2020 IEEE 61st …, 2020 - ieeexplore.ieee.org
In the approximate pattern matching problem, given a text T, a pattern P, and a threshold k,
the task is to find (the starting positions of) all substrings of T that are at distance at most k …

Quantum Speed-Ups for String Synchronizing Sets, Longest Common Substring, and k-mismatch Matching

C **, J Nogler - ACM Transactions on Algorithms, 2024 - dl.acm.org
Longest common substring (LCS) is an important text processing problem, which has
recently been investigated in the quantum query model. The decision version of this …

Computing MEMs and Relatives on Repetitive Text Collections

G Navarro - arxiv preprint arxiv:2210.09914, 2022 - arxiv.org
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …