Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
Deep neural network approximation for custom hardware: Where we've been, where we're going
Deep neural networks have proven to be particularly effective in visual and audio
recognition tasks. Existing models tend to be computationally expensive and memory …
recognition tasks. Existing models tend to be computationally expensive and memory …
EIE: Efficient inference engine on compressed deep neural network
State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and
are both computationally and memory intensive, making them difficult to deploy on …
are both computationally and memory intensive, making them difficult to deploy on …
Multicore bundle adjustment
We present the design and implementation of new inexact Newton type Bundle Adjustment
algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene …
algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene …
AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing
Deep learning systems have been successfully applied to Euclidean data such as images,
video, and audio. In many applications, however, information and their relationships are …
video, and audio. In many applications, however, information and their relationships are …
Memory coherence in shared virtual memory systems
The memory coherence problem in designing and implementing a shared virtual memory on
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Recent advances in computing have led to an explosion in the amount of data being
generated. Processing the ever-growing data in a timely manner has made throughput …
generated. Processing the ever-growing data in a timely manner has made throughput …
SparTen: A sparse tensor accelerator for convolutional neural networks
Convolutional neural networks (CNNs) are emerging as powerful tools for image
processing. Recent machine learning work has reduced CNNs' compute and data volumes …
processing. Recent machine learning work has reduced CNNs' compute and data volumes …
[PDF][PDF] The Chinese Wall Security Policy.
DFC Brewer, MJ Nash - S&P, 1989 - facweb.iitkgp.ac.in
Everyone who has seen the movie Wall Street will have seen a commercial security policy in
action. The recent work of Clark and Wilson and the WIPCIS initiative (the Workshop on …
action. The recent work of Clark and Wilson and the WIPCIS initiative (the Workshop on …
Scalable GPU graph traversal
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-
level graph analysis algorithms. It is also representative of a class of parallel computations …
level graph analysis algorithms. It is also representative of a class of parallel computations …