Mitigating Write Disturbance in Non-Volatile Memory via Coupling Machine Learning with Out-of-Place Updates

R Wu, Z Shen, Z Yang, J Shu - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Non-volatile memory (NVM) opens up new opportunities to resolve scaling restrictions of
main memory, yet it is still hindered by the write disturbance (WD) problem. The WD problem …

VeriGOOD-ML: An open-source flow for automated ML hardware synthesis

H Esmaeilzadeh, S Ghodrati, J Gu… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
This paper introduces VeriGOOD-ML, an automated methodology for generating Verilog
with no human in the loop, starting from a high-level description of a machine learning (ML) …

A reschedulable dataflow-SIMD execution for increased utilization in CGRA cross-domain acceleration

C Yin, N **g, J Jiang, Q Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-
domain acceleration, control flow and memory accesses often degrade the processing …

Energy-efficient hardware acceleration of shallow machine learning applications

Z Zeng, SS Sapatnekar - 2023 Design, Automation & Test in …, 2023 - ieeexplore.ieee.org
ML accelerators have largely focused on building general platforms for deep neural
networks (DNNs), but less so on shallow machine learning (SML) algorithms. This paper …

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

J Cho, M Kim, H Choi, G Heo… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Recently, there has been an extensive research effort in building efficient large language
model (LLM) inference serving systems. These efforts not only include innovations in the …

End-to-end synthesis of dynamically controlled machine learning accelerators

S Curzel, NB Agostini, VG Castellana… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Edge systems are required to autonomously make real-time decisions based on large
quantities of input data under strict power, performance, area, and other constraints. Meeting …

Improving utilization of dataflow unit for multi-batch processing

Z Fan, W Li, Z Wang, Y Yang, X Ye, D Fan… - ACM Transactions on …, 2024 - dl.acm.org
Dataflow architectures can achieve much better performance and higher efficiency than
general-purpose core, approaching the performance of a specialized design while retaining …

A 28-nm Software-Defined Accelerator Chip With Circuit-Pipeline Scaling and Intrinsic Physical Unclonable Function Enabling Secure Configuration

J Zhu, B Yang, L Chen, J Chen, Y Zhang… - IEEE Journal of Solid …, 2025 - ieeexplore.ieee.org
As emerging applications raise ever-boosting and varying computational demand, the
reconfigurable accelerator is becoming prevalent due to balanced performance, efficiency …

Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training

M Adnan, A Phanishayee, J Kulkarni, PJ Nair… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we present a novel technique to search for hardware architectures of
accelerators optimized for end-to-end training of deep neural networks (DNNs). Our …

APPEND: Rethinking ASIP Synthesis in the Era of AI

C Li, Y Wang, H Li, Y Han - 2023 60th ACM/IEEE Design …, 2023 - ieeexplore.ieee.org
Application-specific instruction-set processors (ASIP) has been widely used to speedup
specific applications based on general-purpose processor (CPU) ISA-extension and …