Genome-wide association studies for complex traits: consensus, uncertainty and challenges

MI McCarthy, GR Abecasis, LR Cardon… - Nature reviews …, 2008 - nature.com
The past year has witnessed substantial advances in understanding the genetic basis of
many common phenotypes of biomedical importance. These advances have been the result …

RandNLA: randomized numerical linear algebra

P Drineas, MW Mahoney - Communications of the ACM, 2016 - dl.acm.org
RandNLA: randomized numerical linear algebra Page 1 80 COMMUNICATIONS OF THE ACM
| JUNE 2016 | VOL. 59 | NO. 6 review articles DOI:10.1145/2842602 Randomization offers new …

Minor allele frequency thresholds strongly affect population structure inference with genomic data sets

E Linck, CJ Battey - Molecular Ecology Resources, 2019 - Wiley Online Library
A common method of minimizing errors in large DNA sequence data sets is to drop variable
sites with a minor allele frequency (MAF) below some specified threshold. Although …

Randomized algorithms for matrices and data

MW Mahoney - Foundations and Trends® in Machine …, 2011 - nowpublishers.com
Randomized algorithms for very large matrix problems have received a great deal of
attention in recent years. Much of this work was motivated by problems in large-scale data …

Fast approximation of matrix coherence and statistical leverage

P Drineas, M Magdon-Ismail, MW Mahoney… - The Journal of Machine …, 2012 - dl.acm.org
The statistical leverage scores of a matrix A are the squared row-norms of the matrix
containing its (top) left singular vectors and the coherence is the largest leverage score …

Revisiting the Nyström method for improved large-scale machine learning

A Gittens, MW Mahoney - The Journal of Machine Learning Research, 2016 - dl.acm.org
We reconsider randomized algorithms for the low-rank approximation of symmetric positive
semi-definite (SPSD) matrices such as Laplacian and kernel matrices that arise in data …

A statistical perspective on algorithmic leveraging

P Ma, MW Mahoney, B Yu - The Journal of Machine Learning Research, 2015 - dl.acm.org
One popular method for dealing with large-scale data sets is sampling. For example, by
using the empirical statistical leverage scores as an importance sampling distribution, the …

Application of SNPs for population genetics of nonmodel organisms: new opportunities and challenges

SJ Helyar, J Hemmer‐Hansen… - Molecular ecology …, 2011 - Wiley Online Library
Recent improvements in the speed, cost and accuracy of next generation sequencing are
revolutionizing the discovery of single nucleotide polymorphisms (SNPs). SNPs are …

Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change

W Miller, SC Schuster, AJ Welch, A Ratan… - Proceedings of the …, 2012 - pnas.org
Polar bears (PBs) are superbly adapted to the extreme Arctic environment and have become
emblematic of the threat to biodiversity from global climate change. Their divergence from …

Discovery of multi-dimensional modules by integrative analysis of cancer genomic data

S Zhang, CC Liu, W Li, H Shen, PW Laird… - Nucleic acids …, 2012 - academic.oup.com
Recent technology has made it possible to simultaneously perform multi-platform genomic
profiling (eg DNA methylation (DM) and gene expression (GE)) of biological samples …