Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Deep neural network approximation for custom hardware: Where we've been, where we're going

E Wang, JJ Davis, R Zhao, HC Ng, X Niu… - ACM Computing …, 2019 - dl.acm.org
Deep neural networks have proven to be particularly effective in visual and audio
recognition tasks. Existing models tend to be computationally expensive and memory …

EIE: Efficient inference engine on compressed deep neural network

S Han, X Liu, H Mao, J Pu, A Pedram… - ACM SIGARCH …, 2016 - dl.acm.org
State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and
are both computationally and memory intensive, making them difficult to deploy on …

Multicore bundle adjustment

C Wu, S Agarwal, B Curless, SM Seitz - CVPR 2011, 2011 - ieeexplore.ieee.org
We present the design and implementation of new inexact Newton type Bundle Adjustment
algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene …

AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing

T Geng, A Li, R Shi, C Wu, T Wang, Y Li… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deep learning systems have been successfully applied to Euclidean data such as images,
video, and audio. In many applications, however, information and their relationships are …

Memory coherence in shared virtual memory systems

K Li, P Hudak - ACM Transactions on Computer Systems (TOCS), 1989 - dl.acm.org
The memory coherence problem in designing and implementing a shared virtual memory on
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

VW Lee, C Kim, J Chhugani, M Deisher, D Kim… - Proceedings of the 37th …, 2010 - dl.acm.org
Recent advances in computing have led to an explosion in the amount of data being
generated. Processing the ever-growing data in a timely manner has made throughput …

SparTen: A sparse tensor accelerator for convolutional neural networks

A Gondimalla, N Chesnut, M Thottethodi… - Proceedings of the …, 2019 - dl.acm.org
Convolutional neural networks (CNNs) are emerging as powerful tools for image
processing. Recent machine learning work has reduced CNNs' compute and data volumes …

[PDF][PDF] The Chinese Wall Security Policy.

DFC Brewer, MJ Nash - S&P, 1989 - facweb.iitkgp.ac.in
Everyone who has seen the movie Wall Street will have seen a commercial security policy in
action. The recent work of Clark and Wilson and the WIPCIS initiative (the Workshop on …

Scalable GPU graph traversal

D Merrill, M Garland, A Grimshaw - ACM Sigplan Notices, 2012 - dl.acm.org
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-
level graph analysis algorithms. It is also representative of a class of parallel computations …