Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application

G Lightbody, V Haberland, F Browne… - Briefings in …, 2019 - academic.oup.com
There has been an exponential growth in the performance and output of sequencing
technologies (omics data) with full genome sequencing now producing gigabases of reads …

Genomic data compression

M Hernaez, D Pavlichin, T Weissman… - Annual Review of …, 2019 - annualreviews.org
Recently, there has been growing interest in genome sequencing, driven by advances in
sequencing technology, in terms of both efficiency and affordability. These developments …

A survey on data compression methods for biological sequences

M Hosseini, D Pratas, AJ Pinho - Information, 2016 - mdpi.com
The ever increasing growth of the production of high-throughput sequencing data poses a
serious challenge to the storage, processing and transmission of these data. As frequently …

Representation of k-Mer Sets Using Spectrum-Preserving String Sets

A Rahman, P Medevedev - Journal of Computational Biology, 2021 - liebertpub.com
Given the popularity and elegance of k-mer-based tools, finding a space-efficient way to
represent a set of k-mers is important for improving the scalability of bioinformatics analyses …

Efficient DNA sequence compression with neural networks

M Silva, D Pratas, AJ Pinho - GigaScience, 2020 - academic.oup.com
Background The increasing production of genomic data has led to an intensified need for
models that can cope efficiently with the lossless compression of DNA sequences. Important …

Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences

K Kryukov, MT Ueda, S Nakagawa, T Imanishi - Bioinformatics, 2019 - academic.oup.com
DNA sequence databases use compression such as gzip to reduce the required storage
space and network transmission time. We describe Nucleotide Archival Format (NAF)—a …

Efficient compression of genomic sequences

D Pratas, AJ Pinho… - 2016 Data compression …, 2016 - ieeexplore.ieee.org
The number of genomic sequences is growing substantially. Besides discarding part of the
data, the only efficient possibility for co** with this trend is data compression. We present …

Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences

K Kryukov, MT Ueda, S Nakagawa, T Imanishi - GigaScience, 2020 - academic.oup.com
Background Nearly all molecular sequence databases currently use gzip for data
compression. Ongoing rapid accumulation of stored data calls for a more efficient …

FQSqueezer: k-mer-based compression of sequencing data

S Deorowicz - Scientific reports, 2020 - nature.com
The amount of data produced by modern sequencing instruments that needs to be stored is
huge. Therefore it is not surprising that a lot of work has been done in the field of specialized …

Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis

S Chandak, K Tatwawadi, T Weissman - Bioinformatics, 2018 - academic.oup.com
Abstract Motivation New Generation Sequencing (NGS) technologies for genome
sequencing produce large amounts of short genomic reads per experiment, which are highly …