Intriguing properties of quantization at scale
Emergent properties have been widely adopted as a term to describe behavior not present
in smaller models but observed in larger models (Wei et al., 2022a). Recent work suggests …
in smaller models but observed in larger models (Wei et al., 2022a). Recent work suggests …
Codg-reram: An algorithm-hardware co-design to accelerate semi-structured gnns on reram
Graph Neural Networks (GCNs) have attracted wide attention and are applied to the real
world. However, due to the ever-growing graph data with significant irregularities, off-chip …
world. However, due to the ever-growing graph data with significant irregularities, off-chip …
Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
The likelihood of encountering in-training failures rises substantially with larger Deep
Learning (DL) training workloads, leading to lost work and resource wastage. Such failures …
Learning (DL) training workloads, leading to lost work and resource wastage. Such failures …
Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
In recent years, PIM-based mixed-signal accelerators have been proposed as energy-and
area-efficient solutions with ultra-high throughput to accelerate DNN computations …
area-efficient solutions with ultra-high throughput to accelerate DNN computations …
Dynaquant: Compressing deep learning training checkpoints via dynamic quantization
With the increase in the scale of Deep Learning (DL) training workloads in terms of compute
resources and time consumption, the likelihood of encountering in-training failures rises …
resources and time consumption, the likelihood of encountering in-training failures rises …
BWA-NIMC: Budget-based Workload Allocation for Hybrid Near/In-Memory-Computing
To enable efficient computation for convolutional neural networks, in-memory-computing
(IMC) is proposed to perform computation within memory. However, the non-ideality …
(IMC) is proposed to perform computation within memory. However, the non-ideality …
An algorithm-hardware co-design framework to overcome imperfections of mixed-signal dnn accelerators
In recent years, processing in memory (PIM) based mixedsignal designs have been
proposed as energy-and area-efficient solutions with ultra high throughput to accelerate …
proposed as energy-and area-efficient solutions with ultra high throughput to accelerate …
Reliable edge intelligence in unreliable environment
A key challenge for deployment of artificial intelligence (AI) in real-time safety-critical
systems at the edge is to ensure reliable performance even in unreliable environments. This …
systems at the edge is to ensure reliable performance even in unreliable environments. This …
DEA-NIMC: Dynamic Energy-Aware Policy for Near/In-Memory Computing Hybrid Architecture
In-memory computing (IMC) has become the current trend to accelerate the inference of
deep neural networks (DNNs). Nonetheless, IMC suffers from variations that significantly …
deep neural networks (DNNs). Nonetheless, IMC suffers from variations that significantly …
Hessian-Based Parameter Quantization Method for BERT
W Byun, S Mukhopadhyay - 2023 IEEE 66th International …, 2023 - ieeexplore.ieee.org
Transformer-based language models have shown outstanding performance in various NLP
tasks, but using them on edge devices is very challenging due to their notorious memory …
tasks, but using them on edge devices is very challenging due to their notorious memory …