Exploring and evaluating real-world cxl: Use cases and system adoption

X Wang, J Liu, J Wu, S Yang, J Ren, B Shankar… - arxiv preprint arxiv …, 2024 - arxiv.org
Compute eXpress Link (CXL) is emerging as a promising memory interface technology.
However, its performance characteristics remain largely unclear due to the limited …

Soft error resilience at near-zero cost

J Zeng, SY Huang, J Liu, C Jung - Proceedings of the 38th ACM …, 2024 - dl.acm.org
Among existing schemes for soft error resilience, acoustic-sensor-based detection stands
out owing to its ability to prevent silent data corruption at low hardware cost. However, the …

A parallel programming assessment for stream processing applications on multi-core systems

G Andrade, D Griebler, R Santos… - Computer Standards & …, 2023 - Elsevier
Multi-core systems are any computing device nowadays and stream processing applications
are becoming recurrent workloads, demanding parallelism to achieve the desired quality of …

NAS Parallel Benchmarks with CUDA and beyond

G Araujo, D Griebler, DA Rockenbach… - Software: Practice …, 2023 - Wiley Online Library
Abstract NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the
evaluation of parallel hardware and software. Several research efforts from academia have …

Software resource disaggregation for hpc with serverless computing

M Copik, M Chrapek, L Schmid… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
Aggregated HPC resources have rigid allocation systems and programming models which
struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to …

Benchmarking parallel programming for single-board computers

RB Hoffmann, D Griebler, R da Rosa Righi… - Future Generation …, 2024 - Elsevier
Within the computing continuum, SBCs (single-board computers) are essential in the Edge
and Fog, with many featuring multiple processing cores and GPU accelerators. In this way …

TrackFM: Far-out compiler support for a far memory world

BR Tauro, B Suchy, S Campanoni, P Dinda… - Proceedings of the 29th …, 2024 - dl.acm.org
Large memory workloads with favorable locality of reference can benefit by extending the
memory hierarchy across machines. Systems that enable such far memory configurations …

GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism

DA Rockenbach, G Araujo, D Griebler… - Computer Standards & …, 2025 - Elsevier
Abstract The evolution of Graphics Processing Units (GPUs) has allowed the industry to
overcome long-lasting problems and challenges. Many belong to the stream processing …

Speq: Translation of sparse codes using equivalences

A Laird, B Liu, N Bjørner, MM Dehnavi - Proceedings of the ACM on …, 2024 - dl.acm.org
We present SpEQ, a quick and correct strategy for detecting semantics in sparse codes and
enabling automatic translation to high-performance library calls or domain-specific …

MUPPET: optimizing performance in openmp via mutation testing

D Miao, I Laguna, G Georgakoudis… - Proceedings of the 15th …, 2024 - dl.acm.org
Performance optimization continues to be a challenge in modern HPC software. Existing
performance optimization techniques, including profiling-based and auto-tuning techniques …