Sketching algorithms for genomic data analysis and querying in a secure enclave

C Kockan, K Zhu, N Dokmai, N Karpov, MO Kulekci… - Nature …, 2020 - nature.com
Genome-wide association studies (GWAS), especially on rare diseases, may necessitate
exchange of sensitive genomic data between multiple institutions. Since genomic data …

FQSqueezer: k-mer-based compression of sequencing data

S Deorowicz - Scientific reports, 2020 - nature.com
The amount of data produced by modern sequencing instruments that needs to be stored is
huge. Therefore it is not surprising that a lot of work has been done in the field of specialized …

Index suffix–prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression

Y Liu, Z Yu, ME Dinger, J Li - Bioinformatics, 2019 - academic.oup.com
Motivation Advanced high-throughput sequencing technologies have produced massive
amount of reads data, and algorithms have been specially designed to contract the size of …

Hamming-shifting graph of genomic short reads: Efficient construction and its application for compression

Y Liu, J Li - PLoS Computational Biology, 2021 - journals.plos.org
Graphs such as de Bruijn graphs and OLC (overlap-layout-consensus) graphs have been
widely adopted for the de novo assembly of genomic short reads. This work studies another …

PgRC: pseudogenome-based read compressor

TM Kowalski, S Grabowski - Bioinformatics, 2020 - academic.oup.com
Motivation The amount of sequencing data from high-throughput sequencing technologies
grows at a pace exceeding the one predicted by Moore's law. One of the basic requirements …

FastqCLS: a FASTQ compressor for long-read sequencing via read reordering using a novel scoring model

D Lee, G Song - Bioinformatics, 2022 - academic.oup.com
Motivation Over the past decades, vast amounts of genome sequencing data have been
produced, requiring an enormous level of storage capacity. The time and resources needed …

A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression

T Konstantinovsky, G Yaari - Bioinformatics, 2023 - academic.oup.com
Motivation T-cell receptor beta chain (TCRB) repertoires are crucial for understanding
immune responses. However, their high diversity and complexity present significant …

Transformation of FASTA files into feature vectors for unsupervised compression of short reads databases

T Tang, J Li - Journal of bioinformatics and computational biology, 2021 - World Scientific
FASTA data sets of short reads are usually generated in tens or hundreds for a biomedical
study. However, current compression of these data sets is carried out one-by-one without …

Tackling the challenges of FASTQ referential compression

A Guerra, J Lotero, JÉ Aedo… - … and biology insights, 2019 - journals.sagepub.com
The exponential growth of genomic data has recently motivated the development of
compression algorithms to tackle the storage capacity limitations in bioinformatics centers …

Genomic compression with read alignment at the decoder

Y Gershon, Y Cassuto - IEEE Journal on Selected Areas in …, 2023 - ieeexplore.ieee.org
We propose a new compression scheme for genomic data given as sequence fragments
called reads. The scheme uses a reference genome at the decoder side only, freeing the …