Full accounting for verifiable outsourcing

RS Wahby, Y Ji, AJ Blumberg, A Shelat… - Proceedings of the …, 2017 - dl.acm.org
Systems for verifiable outsourcing incur costs for a prover, a verifier, and precomputation;
outsourcing makes sense when the combination of these costs is cheaper than not …

Symphony: Orchestrating sparse and dense tensors with hierarchical heterogeneous processing

M Pellauer, J Clemons, V Balaji, N Crago… - ACM Transactions on …, 2023 - dl.acm.org
Sparse tensor algorithms are becoming widespread, particularly in the domains of deep
learning, graph and data analytics, and scientific computing. Current high-performance …

T4: Compiling sequential code for effective speculative parallelization in hardware

VA Ying, MC Jeffrey, D Sanchez - 2020 ACM/IEEE 47th Annual …, 2020 - ieeexplore.ieee.org
Multicores are now ubiquitous, but programmers still write sequential code. Speculative
parallelization is an enticing approach to parallelize code while retaining the ease of …

HELIX-UP: Relaxing program semantics to unleash parallelization

S Campanoni, G Holloway, GY Wei… - 2015 IEEE/ACM …, 2015 - ieeexplore.ieee.org
Automatic generation of parallel code for general-purpose commodity processors is a
challenging computational problem. Nevertheless, there is a lot of latent thread-level …

Inter-thread communication in multithreaded, reconfigurable coarse-grain arrays

D Voitsechov, O Port, Y Etsion - 2018 51st Annual IEEE/ACM …, 2018 - ieeexplore.ieee.org
Traditional von Neumann GPGPUs only allow threads to communicate through memory on a
group-to-group basis. In this model, a group of producer threads writes intermediate values …

Predicting new workload or CPU performance by analyzing public datasets

Y Wang, V Lee, GY Wei, D Brooks - ACM Transactions on Architecture …, 2019 - dl.acm.org
The marketplace for general-purpose microprocessors offers hundreds of functionally similar
models, differing by traits like frequency, core count, cache size, memory bandwidth, and …

Phloem: Automatic acceleration of irregular applications with fine-grain pipeline parallelism

QM Nguyen, D Sanchez - 2023 IEEE International Symposium …, 2023 - ieeexplore.ieee.org
Irregular applications are increasingly common in diverse domains, like graph analytics and
sparse linear algebra. Accelerating these applications is challenging because of their …

Trireme: Exploration of hierarchical multi-level parallelism for hardware acceleration

G Zacharopoulos, A Ejjeh, Y **g, EY Yang… - ACM Transactions on …, 2023 - dl.acm.org
The design of heterogeneous systems that include domain specific accelerators is a
challenging and time-consuming process. While taking into account area constraints …

CARAT CAKE: Replacing paging via compiler/kernel cooperation

B Suchy, S Ghosh, D Kersnar, S Chai… - Proceedings of the 27th …, 2022 - dl.acm.org
Virtual memory, specifically paging, is undergoing significant innovation due to being
challenged by new demands from modern workloads. Recent work has demonstrated an …

Cooperative caching for GPUs

S Dublish, V Nagarajan, N Topham - ACM Transactions on Architecture …, 2016 - dl.acm.org
The rise of general-purpose computing on GPUs has influenced architectural innovation on
them. The introduction of an on-chip cache hierarchy is one such innovation. High L1 miss …