A tutorial review: Metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding

PS Gromski, H Muhamadali, DI Ellis, Y Xu, E Correa… - Analytica chimica …, 2015 - Elsevier
The predominance of partial least squares-discriminant analysis (PLS-DA) used to analyze
metabolomics datasets (indeed, it is the most well-known tool to perform classification and …

Missing heritability and strategies for finding the underlying causes of complex disease

EE Eichler, J Flint, G Gibson, A Kong, SM Leal… - Nature reviews …, 2010 - nature.com
Although recent genome-wide studies have provided valuable insights into the genetic basis
of human disease, they have explained relatively little of the heritability of most complex …

[HTML][HTML] Genome-wide modeling of polygenic risk score in colorectal cancer risk

M Thomas, LC Sakoda, M Hoffmeister… - The American journal of …, 2020 - cell.com
Accurate colorectal cancer (CRC) risk prediction models are critical for identifying
individuals at low and high risk of develo** CRC, as they can then be offered targeted …

Random forest for bioinformatics

Y Qi - Ensemble machine learning: Methods and applications, 2012 - Springer
Modern biology has experienced an increased use of machine learning techniques for large
scale and complex biological data analysis. In the area of Bioinformatics, the Random Forest …

Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?

WG Touw, JR Bayjanov, L Overmars… - Briefings in …, 2013 - academic.oup.com
Abstract In the Life Sciences 'omics' data is increasingly generated by different high-
throughput technologies. Often only the integration of these data allows uncovering …

Machine learning for big data analytics in plants

C Ma, HH Zhang, X Wang - Trends in plant science, 2014 - cell.com
Rapid advances in high-throughput genomic technology have enabled biology to enter the
era of 'Big Data'(large datasets). The plant science community not only needs to build its …

Bat biology, genomes, and the Bat1K project: to generate chromosome-level genomes for all living bat species

EC Teeling, SC Vernes, LM Dávalos… - Annual review of …, 2018 - annualreviews.org
Bats are unique among mammals, possessing some of the rarest mammalian adaptations,
including true self-powered flight, laryngeal echolocation, exceptional longevity, unique …

Molecular pathological epidemiology of colorectal neoplasia: an emerging transdisciplinary and interdisciplinary field

S Ogino, AT Chan, CS Fuchs, E Giovannucci - Gut, 2011 - gut.bmj.com
Colorectal cancer is a complex disease resulting from somatic genetic and epigenetic
alterations, including locus-specific CpG island methylation and global DNA or LINE-1 …

What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics

AM Musolf, ER Holzinger, JD Malley, JE Bailey-Wilson - Human Genetics, 2022 - Springer
Genetic data have become increasingly complex within the past decade, leading
researchers to pursue increasingly complex questions, such as those involving epistatic …

Bioinformatics challenges for personalized medicine

GH Fernald, E Capriotti, R Daneshjou… - …, 2011 - academic.oup.com
Motivation: Widespread availability of low-cost, full genome sequencing will introduce new
challenges for bioinformatics. Results: This review outlines recent developments in …