Copperhead: compiling an embedded data parallel language

B Catanzaro, M Garland, K Keutzer - … of the 16th ACM symposium on …, 2011 - dl.acm.org
Modern parallel microprocessors deliver high performance on applications that expose
substantial fine-grained data parallelism. Although data parallelism is widely available in …

Heterogeneous computing and applications in deep learning: A survey

Q Wu, Y Shen, M Zhang - … of the 5th International Conference on …, 2022 - dl.acm.org
With the rapid development of deep learning, a variety of neural network models emerge in
endlessly, which leads to a huge demand for computing resources. For the intensive …

APPy: Annotated Parallelism for Python on GPUs

T Zhou, J Shirako, V Sarkar - Proceedings of the 33rd ACM SIGPLAN …, 2024 - dl.acm.org
GPUs are increasingly being used used to speed up Python applications in the scientific
computing and machine learning domains. Currently, the two common approaches to …

[PDF][PDF] Bohrium: unmodified NumPy code on CPU, GPU, and cluster

MRB Kristensen, SAF Lund, T Blum… - 4th Workshop on …, 2013 - researchgate.net
In this paper we introduce Bohrium, a runtimesystem for map** array-operations onto a
number of different hardware platforms, from multi-core systems to clusters and GPU …

Adaptive input-aware compilation for graphics engines

M Samadi, A Hormati, M Mehrara, J Lee… - Proceedings of the 33rd …, 2012 - dl.acm.org
While graphics processing units (GPUs) provide low-cost and efficient platforms for
accelerating high performance computations, the tedious process of performance tuning …

Incorporating augmented reality content in Engineering Design Graphics materials

J Dorribo-Camba, M Contero - 2013 IEEE Frontiers in …, 2013 - ieeexplore.ieee.org
This paper describes the development and integration of augmented reality content with
traditional Engineering Design Graphics materials, and presents the results of a preliminary …

[PDF][PDF] A fresh look at retiming via clock skew optimization

RB Deokar, SS Sapatnekar - Proceedings of the 32nd annual ACM/IEEE …, 1995 - dl.acm.org
The introduction of clock skew at an edge-triggered ipop has an e ect that is similar to the
movement of the ip-op across combinational logic module boundaries, and these are …

Bohrium: a virtual machine approach to portable parallelism

MRB Kristensen, SAF Lund, T Blum… - … Parallel & Distributed …, 2014 - ieeexplore.ieee.org
In this paper we introduce, Bohrium, a runtime-system for map** vector operations onto a
number of different hardware platforms, from simple multi-core systems to clusters and GPU …

Wireless link snr map** onto an indoor testbed

J Lei, R Yates, L Greenstein… - … Conference on Testbeds …, 2005 - ieeexplore.ieee.org
To facilitate a broad range of experimental research on novel protocols and application
concepts, we consider an indoor wireless testbed to emulate the performance of real-world …

CuNesl: Compiling nested data-parallel languages for SIMT architectures

Y Zhang, F Mueller - 2012 41st International Conference on …, 2012 - ieeexplore.ieee.org
Data-parallel languages feature fine-grained parallel primitives that can be supported by
compilers targeting modern many-core architectures where data parallelism must be …