A survey on distributed machine learning
The demand for artificial intelligence has grown significantly over the past decade, and this
growth has been fueled by advances in machine learning techniques and the ability to …
growth has been fueled by advances in machine learning techniques and the ability to …
The future of computing beyond Moore's Law
J Shalf - Philosophical Transactions of the Royal Society …, 2020 - royalsocietypublishing.org
Moore's Law is a techno-economic model that has enabled the information technology
industry to double the performance and functionality of digital electronics roughly every 2 …
industry to double the performance and functionality of digital electronics roughly every 2 …
Flashattention: Fast and memory-efficient exact attention with io-awareness
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …
complexity of self-attention are quadratic in sequence length. Approximate attention …
Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence
Smarter applications are making better use of the insights gleaned from data, having an
impact on every industry and research discipline. At the core of this revolution lies the tools …
impact on every industry and research discipline. At the core of this revolution lies the tools …
COIL: Revisit exact lexical match in information retrieval with contextualized inverted list
Classical information retrieval systems such as BM25 rely on exact lexical match and carry
out search efficiently with inverted list index. Recent neural IR models shifts towards soft …
out search efficiently with inverted list index. Recent neural IR models shifts towards soft …
[HTML][HTML] A suite of tutorials for the WESTPA rare-events sampling software [Article v1. 0]
The weighted ensemble (WE) strategy has been demonstrated to be highly efficient in
generating pathways and rate constants for rare events such as protein folding and protein …
generating pathways and rate constants for rare events such as protein folding and protein …
Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …
fundamentally memory-bound. For such workloads, the data movement between main …
[HTML][HTML] An improved chain of spheres for exchange algorithm
In the present work, we describe a more accurate and efficient variant of the chain-of-
spheres algorithm (COSX) for exchange matrix computations. Higher accuracy for the …
spheres algorithm (COSX) for exchange matrix computations. Higher accuracy for the …
qpOASES: A parametric active-set algorithm for quadratic programming
Many practical applications lead to optimization problems that can either be stated as
quadratic programming (QP) problems or require the solution of QP problems on a lower …
quadratic programming (QP) problems or require the solution of QP problems on a lower …
Extensor: An accelerator for sparse tensor algebra
Generalized tensor algebra is a prime candidate for acceleration via customized ASICs.
Modern tensors feature a wide range of data sparsity, with the density of non-zero elements …
Modern tensors feature a wide range of data sparsity, with the density of non-zero elements …