Human genome sequencing at the population scale: a primer on high-throughput DNA sequencing and analysis

RL Goldfeder, DP Wall, MJ Khoury… - American journal of …, 2017 - academic.oup.com
Most human diseases have underlying genetic causes. To better understand the impact of
genes on disease and its implications for medicine and public health, researchers have …

Algorithms and design strategies towards automated glycoproteomics analysis

H Hu, K Khatri, J Zaia - Mass spectrometry reviews, 2017 - Wiley Online Library
Glycoproteomics involves the study of glycosylation events on protein sequences ranging
from purified proteins to whole proteome scales. Understanding these complex post …

PyCOMPSs: Parallel computational workflows in Python

E Tejedor, Y Becerra, G Alomar… - … Journal of High …, 2017 - journals.sagepub.com
The use of the Python programming language for scientific computing has been gaining
momentum in the last years. The fact that it is compact and readable and its complete set of …

Harnessing the power of many: Extensible toolkit for scalable ensemble applications

V Balasubramanian, M Turilli, W Hu… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
Many scientific problems require multiple distinct computational tasks to be executed in
order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address …

Ensemble toolkit: Scalable and flexible execution of ensembles of tasks

V Balasubramanian, A Treikalis… - 2016 45th …, 2016 - ieeexplore.ieee.org
There are many science applications that require scalable task-level parallelism, support for
flexible execution and coupling of ensembles of simulations. Most high-performance system …

CGAT-core: a python framework for building scalable, reproducible computational biology workflows

AP Cribbs, S Luna-Valero, C George, IM Sudbery… - bioRxiv, 2019 - biorxiv.org
In the genomics era computational biologists regularly need to process, analyse and
integrate large and complex biomedical datasets. Analysis inevitably involves multiple …

uap: reproducible and robust HTS data analysis

C Kämpf, M Specht, A Scholz, SH Puppel, G Doose… - BMC …, 2019 - Springer
Background A lack of reproducibility has been repeatedly criticized in computational
research. High throughput sequencing (HTS) data analysis is a complex multi-step process …

Scalable and cost-effective NGS genoty** in the cloud

Y Souilmi, AK Lancaster, JY Jung, E Rizzo… - BMC medical …, 2015 - Springer
Background While next-generation sequencing (NGS) costs have plummeted in recent
years, cost and complexity of computation remain substantial barriers to the use of NGS in …

GenomeVIP: a cloud platform for genomic variant discovery and interpretation

RJ Mashl, AD Scott, K Huang… - Genome …, 2017 - genome.cshlp.org
Identifying genomic variants is a fundamental first step toward the understanding of the role
of inherited and acquired variation in disease. The accelerating growth in the corpus of …

Computational methods for the discovery and annotation of viral integrations

U Palatini, E Pischedda, M Bonizzoni - piRNA: Methods and Protocols, 2022 - Springer
The transfer of genetic material between viruses and eukaryotic cells is pervasive. Somatic
integrations of DNA viruses and retroviruses have been linked to persistent viral infection …