Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence
Smarter applications are making better use of the insights gleaned from data, having an
impact on every industry and research discipline. At the core of this revolution lies the tools …
impact on every industry and research discipline. At the core of this revolution lies the tools …
Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High‐Performance Computing Systems
This paper provides a review of contemporary methodologies and APIs for parallel
programming, with representative technologies selected in terms of target system type …
programming, with representative technologies selected in terms of target system type …
Actively triggerable metals via liquid metal embrittlement for biomedical applications
Actively triggerable materials, which break down upon introduction of an exogenous
stimulus, enable precise control over the lifetime of biomedical technologies, as well as …
stimulus, enable precise control over the lifetime of biomedical technologies, as well as …
Better Together: Jointly Optimizing {ML} Collective Scheduling and Execution Planning using {SYNDICATE}
Emerging ML training deployments are trending towards larger models, and hybrid-parallel
training that is not just dominated by compute-intensive all-reduce for gradient aggregation …
training that is not just dominated by compute-intensive all-reduce for gradient aggregation …
GASNet-EX: A high-performance, portable communication library for exascale
Abstract Partitioned Global Address Space (PGAS) models, typified by languages such as
Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key …
Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key …
Enabling compute-communication overlap in distributed deep learning training platforms
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …
The OpenMP cluster programming model
Despite the various research initiatives and proposed programming models, efficient
solutions for parallel programming in HPC clusters still rely on a complex combination of …
solutions for parallel programming in HPC clusters still rely on a complex combination of …
GPU accelerated feature engineering and training for recommender systems
In this paper we present our 1st place solution of the RecSys Challenge 2020 which focused
on the prediction of user behavior, specifically the interaction with content, on this year's …
on the prediction of user behavior, specifically the interaction with content, on this year's …
Ad hoc file systems for high-performance computing
Storage backends of parallel compute clusters are still based mostly on magnetic disks,
while newer and faster storage technologies such as flash-based SSDs or non-volatile …
while newer and faster storage technologies such as flash-based SSDs or non-volatile …
Bringing UMAP closer to the speed of light with GPU acceleration
Abstract The Uniform Manifold Approximation and Projection (UMAP) algorithm has become
widely popular for its ease of use, quality of results, and support for exploratory …
widely popular for its ease of use, quality of results, and support for exploratory …