Indexing highly repetitive string collections, part II: compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

[หนังสือ][B] The Logical Approach to Automatic Sequences

J Shallit - 2022 - books.google.com
Automatic sequences are sequences over a finite alphabet generated by a finite-state
machine. This book presents a novel viewpoint on automatic sequences, and more …

Resolution of the burrows-wheeler transform conjecture

D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …

Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space

D Kempa, T Kociumaka - 2023 IEEE 64th Annual Symposium …, 2023 - ieeexplore.ieee.org
The last two decades have witnessed a dramatic increase in the amount of highly repetitive
datasets consisting of sequential data (strings, texts). Processing these massive amounts of …

Optimal-time dictionary-compressed indexes

AR Christiansen, MB Ettienne, T Kociumaka… - ACM Transactions on …, 2020 - dl.acm.org
We describe the first self-indexes able to count and locate pattern occurrences in optimal
time within a space bounded by the size of the most popular dictionary compressors. To …

Internal pattern matching queries in a text and applications

T Kociumaka, J Radoszewski, W Rytter, T Waleń - SIAM Journal on …, 2024 - SIAM
We consider several types of internal queries, that is, questions about fragments of a given
text specified in constant space by their locations in. Our main result is an optimal data …

Sensitivity of string compressors and repetitiveness measures

T Akagi, M Funakoshi, S Inenaga - Information and Computation, 2023 - Elsevier
The sensitivity of a string compression algorithm C asks how much the output size C (T) for
an input string T can increase when a single character edit operation is performed on T. This …

Toward a definitive compressibility measure for repetitive sequences

T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …

An upper bound and linear-space queries on the LZ-end parsing

D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …

Sigmoni: classification of nanopore signal with a compressed pangenome index

VS Shivakumar, OY Ahmed, S Kovaka, M Zakeri… - …, 2024 - academic.oup.com
Improvements in nanopore sequencing necessitate efficient classification methods,
including pre-filtering and adaptive sampling algorithms that enrich for reads of interest …