SHARP: A short-word hierarchical accelerator for robust and practical fully homomorphic encryption
Fully homomorphic encryption (FHE) is an emerging cryptographic technology that
guarantees the privacy of sensitive user data by enabling direct computations on encrypted …
guarantees the privacy of sensitive user data by enabling direct computations on encrypted …
Bingo spatial data prefetcher
Applications extensively use data objects with a regular and fixed layout, which leads to the
recurrence of access patterns over memory regions. Spatial data prefetching techniques …
recurrence of access patterns over memory regions. Spatial data prefetching techniques …
Evaluation of hardware data prefetchers on server processors
Data prefetching, ie, the act of predicting an application's future memory accesses and
fetching those that are not in the on-chip caches, is a well-known and widely used approach …
fetching those that are not in the on-chip caches, is a well-known and widely used approach …
Gpu-nest: Characterizing energy efficiency of multi-gpu inference servers
A Jahanshahi, HZ Sabzi, C Lau… - IEEE Computer …, 2020 - ieeexplore.ieee.org
Cloud inference systems have recently emerged as a solution to the ever-increasing
integration of AI-powered applications into the smart devices around us. The wide adoption …
integration of AI-powered applications into the smart devices around us. The wide adoption …
Enhancing server efficiency in the face of killer microseconds
A Mirhosseini, A Sriraman… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
We are entering an era of “killer microseconds” in data center applications. Killer
microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O …
microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O …
Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems
As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …
BOW: Breathing operand windows to exploit bypassing in GPUs
The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible
for a large portion of the area and power. To simplify the architecture of the RF, it is …
for a large portion of the area and power. To simplify the architecture of the RF, it is …
OSM: Off-chip shared memory for GPUs
Graphics Processing Units (GPUs) employ a shared memory, a software-managed cache for
programmers, in each streaming multiprocessor to accelerate data sharing among the …
programmers, in each streaming multiprocessor to accelerate data sharing among the …
Ready: A fine-grained multithreading overlay framework for modern cpu-fpga dataflow applications
In this work, we propose a framework called REconfigurable Accelerator DeploY (READY),
the first framework to support polynomial runtime map** of dataflow applications in high …
the first framework to support polynomial runtime map** of dataflow applications in high …
High performance and power efficient accelerator for cloud inference
J Yao, H Zhou, Y Zhang, Y Li, C Feng… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Facing the growing complexity of Deep Neural Networks (DNNs), high-performance and
power-efficient AI accelerators are desired to provide effective and affordable cloud …
power-efficient AI accelerators are desired to provide effective and affordable cloud …