A survey of CPU-GPU heterogeneous computing techniques

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2015 - dl.acm.org
As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …

Floem: A programming system for {NIC-Accelerated} network applications

PM Phothilimthana, M Liu, A Kaufmann… - … USENIX Symposium on …, 2018 - usenix.org
Develo** server applications that offload computation and data to a NIC accelerator is
laborious because one has to explore the design space of decisions about data placement …

Energy profiles of java collections classes

S Hasan, Z King, M Hafiz, M Sayagh, B Adams… - Proceedings of the 38th …, 2016 - dl.acm.org
We created detailed profiles of the energy consumed by common operations done on Java
List, Map, and Set abstractions. The results show that the alternative data types for these …

Seeds: A software engineer's energy-optimization decision support framework

I Manotas, L Pollock, J Clause - … of the 36th International Conference on …, 2014 - dl.acm.org
Reducing the energy usage of software is becoming more important in many environments,
in particular, battery-powered mobile devices, embedded systems and data centers. Recent …

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

M Steuwer, C Fensch, S Lindley, C Dubach - ACM SIGPLAN Notices, 2015 - dl.acm.org
Computers have become increasingly complex with the emergence of heterogeneous
hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous …

Autotm: Automatic tensor movement in heterogeneous memory systems using integer linear programming

M Hildebrand, J Khan, S Trika, J Lowe-Power… - Proceedings of the …, 2020 - dl.acm.org
Memory capacity is a key bottleneck for training large scale neural networks. Intel® Optane#
8482; DC PMM (persistent memory modules) which are available as NVDIMMs are a …

Adaptive heterogeneous scheduling for integrated GPUs

R Kaleem, R Barik, T Shpeisman, BT Lewis… - Proceedings of the 23rd …, 2014 - dl.acm.org
Many processors today integrate a CPU and GPU on the same die, which allows them to
share resources like physical memory and lowers the cost of CPU-GPU communication. As …

A flexible approach to autotuning multi-pass machine learning compilers

PM Phothilimthana, A Sabne, N Sarda… - 2021 30th …, 2021 - ieeexplore.ieee.org
Search-based techniques have been demonstrated effective in solving complex optimization
problems that arise in domain-specific compilers for machine learning (ML). Unfortunately …

Regularized least absolute deviations regression and an efficient algorithm for parameter tuning

L Wang, MD Gordon, J Zhu - Sixth International Conference on …, 2006 - ieeexplore.ieee.org
Linear regression is one of the most important and widely used techniques for data analysis.
However, sometimes people are not satisfied with it because of the following two limitations …

Data partitioning on multicore and multi-GPU platforms using functional performance models

Z Zhong, V Rychkov… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
Heterogeneous multiprocessor systems, which are composed of a mix of processing
elements, such as commodity multicore processors, graphics processing units (GPUs), and …