PgRC: pseudogenome-based read compressor

TM Kowalski, S Grabowski - Bioinformatics, 2020 - academic.oup.com
Motivation The amount of sequencing data from high-throughput sequencing technologies
grows at a pace exceeding the one predicted by Moore's law. One of the basic requirements …

Constructing small genome graphs via string compression

Y Qiu, C Kingsford - Bioinformatics, 2021 - academic.oup.com
Motivation The size of a genome graph—the space required to store the nodes, node labels
and edges—affects the efficiency of operations performed on it. For example, the time …

Making de Bruijn Graphs Eulerian

G Bernardini, H Chen, G Loukides, SP Pissis… - LEIBNIZ …, 2022 - air.unimi.it
A directed multigraph is called Eulerian if it has a circuit which uses each edge exactly once.
Euler's theorem tells us that a weakly connected directed multigraph is Eulerian if and only if …

[PDF][PDF] Algorithmic Foundations of Genome Graph Construction and Comparison

Y Qiu - 2023 - kingsfordlab.cbd.cmu.edu
Pangenomic studies have enabled a more accurate depiction of the human genome
landscape. Genome graphs are suitable data structures for analyzing collections of …

[KNJIGA][B] New Applications of the Nearest-Neighbor Chain Algorithm

NM Grande - 2019 - search.proquest.com
The nearest-neighbor chain algorithm was proposed in the eighties as a way to speed up
certain hierarchical clustering algorithms. In the first part of the dissertation, we show that its …

Compressed multiple pattern matching

D Kosolobov, N Sivukhin - arxiv preprint arxiv:1811.01248, 2018 - arxiv.org
Given $ d $ strings over the alphabet $\{0, 1,\ldots,\sigma {-} 1\} $, the classical Aho--
Corasick data structure allows us to find all $ occ $ occurrences of the strings in any text $ T …

[PDF][PDF] Fast Implementation of Shortest Common Superstring Approximation with Application to Relative Lempel-Ziv Dictionary Construction

A Kilpinen - 2022 - helda.helsinki.fi
Textually represented data–text–is one of the most common types of preserved information.
It occurs in many forms, eg, source code, markup languages and plain text. Also, DNA and …

GREEDY SHORTEST SUPERSTRING WITH DELAYED RANDOM CHOICE

MRA Sara, MFJ Klaib, M Hasan - International Journal of …, 2020 - journal.ump.edu.my
The shortest superstring problem for a given set of strings is to find a string of minimum
length such that each input string is a substring of the resulting string. This problem is known …

[PDF][PDF] DISPLACEMENT ACTIVITY: SOLVING THE SHORTEST COMMON SUPERSTRING PROBLEM VIA DEEP REINFORCEMENT LEARNING

V Ayyappan, JY Guo, RFL Perry, HF VanRenterghem - 2019 - rflperry.github.io
Hand-crafted heuristics for solving NP-hard optimization problems are fast and often
effective; yet they lack theoretical guarantees and often fail to generalize across problem …