Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence

S Raschka, J Patterson, C Nolet - Information, 2020 - mdpi.com
Smarter applications are making better use of the insights gleaned from data, having an
impact on every industry and research discipline. At the core of this revolution lies the tools …

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High‐Performance Computing Systems

P Czarnul, J Proficz, K Drypczewski - Scientific Programming, 2020 - Wiley Online Library
This paper provides a review of contemporary methodologies and APIs for parallel
programming, with representative technologies selected in terms of target system type …

Actively triggerable metals via liquid metal embrittlement for biomedical applications

VR Feig, E Remlova, B Muller… - Advanced …, 2023 - Wiley Online Library
Actively triggerable materials, which break down upon introduction of an exogenous
stimulus, enable precise control over the lifetime of biomedical technologies, as well as …

Better Together: Jointly Optimizing {ML} Collective Scheduling and Execution Planning using {SYNDICATE}

K Mahajan, CH Chu, S Sridharan, A Akella - 20th USENIX Symposium …, 2023 - usenix.org
Emerging ML training deployments are trending towards larger models, and hybrid-parallel
training that is not just dominated by compute-intensive all-reduce for gradient aggregation …

GASNet-EX: A high-performance, portable communication library for exascale

D Bonachea, PH Hargrove - … Workshop on Languages and Compilers for …, 2018 - Springer
Abstract Partitioned Global Address Space (PGAS) models, typified by languages such as
Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key …

Enabling compute-communication overlap in distributed deep learning training platforms

S Rashidi, M Denton, S Sridharan… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …

The OpenMP cluster programming model

H Yviquel, M Pereira, E Francesquini… - … Proceedings of the …, 2022 - dl.acm.org
Despite the various research initiatives and proposed programming models, efficient
solutions for parallel programming in HPC clusters still rely on a complex combination of …

GPU accelerated feature engineering and training for recommender systems

B Schifferer, G Titericz, C Deotte, C Henkel… - Proceedings of the …, 2020 - dl.acm.org
In this paper we present our 1st place solution of the RecSys Challenge 2020 which focused
on the prediction of user behavior, specifically the interaction with content, on this year's …

Ad hoc file systems for high-performance computing

A Brinkmann, K Mohror, W Yu, P Carns… - Journal of Computer …, 2020 - Springer
Storage backends of parallel compute clusters are still based mostly on magnetic disks,
while newer and faster storage technologies such as flash-based SSDs or non-volatile …

Bringing UMAP closer to the speed of light with GPU acceleration

CJ Nolet, V Lafargue, E Raff, T Nanditale… - Proceedings of the …, 2021 - ojs.aaai.org
Abstract The Uniform Manifold Approximation and Projection (UMAP) algorithm has become
widely popular for its ease of use, quality of results, and support for exploratory …