Program reconditioning: Avoiding undefined behaviour when finding and reducing compiler bugs

B Lecoeur, H Mohsin, AF Donaldson - Proceedings of the ACM on …, 2023 - dl.acm.org
We introduce program reconditioning, a method for allowing program generation and
differential testing to be used to find miscompilation bugs, and test-case reduction to be used …

Portable inter-workgroup barrier synchronisation for GPUs

T Sorensen, AF Donaldson, M Batty… - Proceedings of the …, 2016 - dl.acm.org
Despite the growing popularity of GPGPU programming, there is not yet a portable and
formally-specified barrier that one can use to synchronise across workgroups. Moreover, the …

[PDF][PDF] Towards Unified Analysis of GPU Consistency

H Tong, N Gavrilenko… - 29th ACM …, 2024 - hernanponcedeleon.github.io
After more than 30 years of research, there is a solid understanding of the consistency
guarantees given by CPU systems. Unfortunately, the same is not yet true for GPUs. The …

Parallel fractal image compression using quadtree partition with task and dynamic parallelism

FJ Hernandez-Lopez, O Muñiz-Pérez - Journal of Real-Time Image …, 2022 - Springer
Fractal image compression is a lossy compression technique based on the iterative function
system, which can be used to reduce the storage space and increase the speed of data …

Gpuharbor: Testing gpu memory consistency at large (experience paper)

R Levine, M Cho, D McKee, A Quinn… - Proceedings of the 32nd …, 2023 - dl.acm.org
Memory consistency specifications (MCSs) are a difficult, yet critical, part of a concurrent
programming framework. Existing MCS testing tools are not immediately accessible, and …

Automated test generation for OpenCL kernels using fuzzing and constraint solving

C Peng, A Rajan - Proceedings of the 13th Annual Workshop on General …, 2020 - dl.acm.org
Graphics Processing Units (GPUs) are massively parallel processors offering performance
acceleration and energy efficiency unmatched by current processors (CPUs) in computers …

Redwood: Flexible and Portable Heterogeneous Tree Traversal Workloads

Y Xu, A Li, T Sorensen - 2023 IEEE International Symposium …, 2023 - ieeexplore.ieee.org
Shared memory heterogeneous systems are now mainstream, with nearly every mobile
phone and tablet containing integrated processing units. However, develo** applications …

GPU schedulers: how fair is fair enough?

T Sorensen, H Evrard… - … on Concurrency Theory …, 2018 - drops.dagstuhl.de
Blocking synchronisation idioms, eg mutexes and barriers, play an important role in
concurrent programming. However, systems with semi-fair schedulers, eg graphics …

Training progressively binarizing deep networks using FPGAs

C Lammie, W **ang, MR Azghadi - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
While hardware implementations of inference routines for Binarized Neural Networks
(BNNs) are plentiful, current realizations of efficient BNN hardware training accelerators …

Cltestcheck: Measuring test effectiveness for gpu kernels

C Peng, A Rajan - … : 22nd International Conference, FASE 2019, Held as …, 2019 - Springer
Massive parallelism, and energy efficiency of GPUs, along with advances in their
programmability with OpenCL and CUDA programming models have made them attractive …