PgRC: pseudogenome-based read compressor
TM Kowalski, S Grabowski - Bioinformatics, 2020 - academic.oup.com
Motivation The amount of sequencing data from high-throughput sequencing technologies
grows at a pace exceeding the one predicted by Moore's law. One of the basic requirements …
grows at a pace exceeding the one predicted by Moore's law. One of the basic requirements …
Constructing small genome graphs via string compression
Motivation The size of a genome graph—the space required to store the nodes, node labels
and edges—affects the efficiency of operations performed on it. For example, the time …
and edges—affects the efficiency of operations performed on it. For example, the time …
Making de Bruijn Graphs Eulerian
A directed multigraph is called Eulerian if it has a circuit which uses each edge exactly once.
Euler's theorem tells us that a weakly connected directed multigraph is Eulerian if and only if …
Euler's theorem tells us that a weakly connected directed multigraph is Eulerian if and only if …
[PDF][PDF] Algorithmic Foundations of Genome Graph Construction and Comparison
Y Qiu - 2023 - kingsfordlab.cbd.cmu.edu
Pangenomic studies have enabled a more accurate depiction of the human genome
landscape. Genome graphs are suitable data structures for analyzing collections of …
landscape. Genome graphs are suitable data structures for analyzing collections of …
[KNJIGA][B] New Applications of the Nearest-Neighbor Chain Algorithm
NM Grande - 2019 - search.proquest.com
The nearest-neighbor chain algorithm was proposed in the eighties as a way to speed up
certain hierarchical clustering algorithms. In the first part of the dissertation, we show that its …
certain hierarchical clustering algorithms. In the first part of the dissertation, we show that its …
Compressed multiple pattern matching
D Kosolobov, N Sivukhin - arxiv preprint arxiv:1811.01248, 2018 - arxiv.org
Given $ d $ strings over the alphabet $\{0, 1,\ldots,\sigma {-} 1\} $, the classical Aho--
Corasick data structure allows us to find all $ occ $ occurrences of the strings in any text $ T …
Corasick data structure allows us to find all $ occ $ occurrences of the strings in any text $ T …
[PDF][PDF] Fast Implementation of Shortest Common Superstring Approximation with Application to Relative Lempel-Ziv Dictionary Construction
A Kilpinen - 2022 - helda.helsinki.fi
Textually represented data–text–is one of the most common types of preserved information.
It occurs in many forms, eg, source code, markup languages and plain text. Also, DNA and …
It occurs in many forms, eg, source code, markup languages and plain text. Also, DNA and …
GREEDY SHORTEST SUPERSTRING WITH DELAYED RANDOM CHOICE
The shortest superstring problem for a given set of strings is to find a string of minimum
length such that each input string is a substring of the resulting string. This problem is known …
length such that each input string is a substring of the resulting string. This problem is known …
[PDF][PDF] DISPLACEMENT ACTIVITY: SOLVING THE SHORTEST COMMON SUPERSTRING PROBLEM VIA DEEP REINFORCEMENT LEARNING
V Ayyappan, JY Guo, RFL Perry, HF VanRenterghem - 2019 - rflperry.github.io
Hand-crafted heuristics for solving NP-hard optimization problems are fast and often
effective; yet they lack theoretical guarantees and often fail to generalize across problem …
effective; yet they lack theoretical guarantees and often fail to generalize across problem …