Randomness in neural network training: Characterizing the impact of tooling
The quest for determinism in machine learning has disproportionately focused on
characterizing the impact of noise introduced by algorithmic design choices. In this work, we …
characterizing the impact of noise introduced by algorithmic design choices. In this work, we …
Demystifying tensorrt: Characterizing neural network inference engine on nvidia edge devices
Edge devices are seeing tremendous growth in sensing and computational capabilities.
Running state-of-the-art deep neural network (NN) based data processing on multi-core …
Running state-of-the-art deep neural network (NN) based data processing on multi-core …
A software-defined tensor streaming multiprocessor for large-scale machine learning
We describe our novel commercial software-defined approach for large-scale
interconnection networks of tensor streaming processing (TSP) elements. The system …
interconnection networks of tensor streaming processing (TSP) elements. The system …
Not all gpus are created equal: characterizing variability in large-scale, accelerator-rich systems
Scientists are increasingly exploring and utilizing the massive parallelism of general-
purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters …
purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters …
Universal checkpointing: Efficient and flexible checkpointing for large scale distributed training
X Lian, SA Jacobs, L Kurilenko, M Tanaka… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing checkpointing approaches seem ill-suited for distributed training even though
hardware limitations make model parallelism, ie, sharding model state across multiple …
hardware limitations make model parallelism, ie, sharding model state across multiple …
Reproducibility of machine learning: Terminology, recommendations and open issues
Reproducibility is one of the core dimensions that concur to deliver Trustworthy Artificial
Intelligence. Broadly speaking, reproducibility can be defined as the possibility to reproduce …
Intelligence. Broadly speaking, reproducibility can be defined as the possibility to reproduce …
On The Fairness Impacts of Hardware Selection in Machine Learning
In the machine learning ecosystem, hardware selection is often regarded as a mere utility,
overshadowed by the spotlight on algorithms and data. This is especially relevant in …
overshadowed by the spotlight on algorithms and data. This is especially relevant in …
DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines
Differentiable rendering is a technique used in an important emerging class of visual
computing applications that involves representing a 3D scene as a model that is trained from …
computing applications that involves representing a 3D scene as a model that is trained from …
Only buffer when you need to: Reducing on-chip gpu traffic with reconfigurable local atomic buffers
In recent years, due to their wide availability and ease of programming, GPUs have emerged
as the accelerator of choice for a wide variety of applications including graph analytics and …
as the accelerator of choice for a wide variety of applications including graph analytics and …
Optimistic Verifiable Training by Controlling Hardware Nondeterminism
The increasing compute demands of AI systems has led to the emergence of services that
train models on behalf of clients lacking necessary resources. However, ensuring …
train models on behalf of clients lacking necessary resources. However, ensuring …