Supporting very large models using automatic dataflow graph partitioning

M Wang, C Huang, J Li - … of the Fourteenth EuroSys Conference 2019, 2019 - dl.acm.org
This paper presents Tofu, a system that partitions very large DNN models across multiple
GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow …

An approach for quantitative analysis of application-specific dataflow architectures

B Kienhuis, E Deprettere, K Vissers… - Proceedings IEEE …, 1997 - ieeexplore.ieee.org
In this paper we present an approach for quantitative analysis of application-specific
dataflow architectures. The approach allows the designer to rate design alternatives in a …

[CARTE][B] The compiler design handbook: optimizations and machine code generation

YN Srikant, P Shankar - 2002 - taylorfrancis.com
The widespread use of object-oriented languages and Internet security concerns are just the
beginning. Add embedded systems, multiple memory banks, highly pipelined units …

A survey on auto-parallelism of large-scale deep learning training

P Liang, Y Tang, X Zhang, Y Bai, T Su… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Deep learning (DL) has gained great success in recent years, leading to state-of-the-art
performance in research community and industrial fields like computer vision and natural …

Automatic data layout for high performance fortran

K Kennedy, U Kremer - Proceedings of the 1995 ACM/IEEE conference …, 1995 - dl.acm.org
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel
programming. The goal of HPF is to provide a simple yet efficient machine independent …

[PDF][PDF] Automatic data layout using 0-1 integer programming

RE Bixby, K Kennedy, U Kremer - IFIP Transactions, 1994 - Citeseer
The goal of languages like Fortran D or HPF is to provide a simple yet e cient
machineindependent parallel programming model. By shifting much of the burden of …

Automatic data layout generation and kernel map** for cpu+ gpu architectures

D Majeti, KS Meel, R Barik, V Sarkar - Proceedings of the 25th …, 2016 - dl.acm.org
The ubiquity of hybrid CPU+ GPU architectures has led to renewed interest in automatic
data layout generation owing to the fact that data layouts have a large impact on …

Pattern‐Driven Automatic Parallelization

CW Kessler - Scientific Programming, 1996 - Wiley Online Library
This article describes a knowledge‐based system for automatic parallelization of a wide
class of sequential numerical codes operating on vectors and dense matrices, and for …

Unifying data, model and hybrid parallelism in deep learning via tensor tiling

M Wang, C Huang, J Li - arxiv preprint arxiv:1805.04170, 2018 - arxiv.org
Deep learning systems have become vital tools across many fields, but the increasing model
sizes mean that training must be accelerated to maintain such systems' utility. Current …

Spartan: A distributed array framework with smart tiling

CC Huang, Q Chen, Z Wang, R Power, J Ortiz… - 2015 USENIX Annual …, 2015 - usenix.org
Application programmers in domains like machine learning, scientific computing, and
computational biology are accustomed to using powerful, high productivity array languages …