Film-qnn: Efficient fpga acceleration of deep neural networks with intra-layer, mixed-precision quantization M Sun, Z Li, A Lu, Y Li, SE Chang, X Ma, X Lin, Z Fang Proceedings of the 2022 ACM/SIGDA International Symposium on Field …, 2022 | 77 | 2022 |
Auto-vit-acc: An fpga-aware automatic acceleration framework for vision transformer with mixed-scheme quantization Z Li, M Sun, A Lu, H Ma, G Yuan, Y Xie, H Tang, Y Li, M Leeser, Z Wang, ... 2022 32nd International Conference on Field-Programmable Logic and …, 2022 | 66 | 2022 |
Heatvit: Hardware-efficient adaptive token pruning for vision transformers P Dong, M Sun, A Lu, Y Xie, K Liu, Z Kong, X Meng, Z Li, X Lin, Z Fang, ... 2023 IEEE International Symposium on High-Performance Computer Architecture …, 2023 | 58 | 2023 |
Demystifying the memory system of modern datacenter FPGAs for software programmers through microbenchmarking A Lu, Z Fang, W Liu, L Shannon The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays …, 2021 | 33 | 2021 |
CHIP-KNN: A configurable and high-performance k-nearest neighbors accelerator on cloud FPGAs A Lu, Z Fang, N Farahpour, L Shannon 2020 International Conference on Field-Programmable Technology (ICFPT), 139-147, 2020 | 33 | 2020 |
Rethinking integer divider design for FPGA-based soft-processors E Matthews, A Lu, Z Fang, L Shannon 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom …, 2019 | 32 | 2019 |
Yuze Chi, and Zhenman Fang. SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-Based FPGAs X Tian, Z Ye, A Lu, L Guo ACM Trans. Reconfigurable Technol. Syst 16 (2), 2023 | 9 | 2023 |
SQL2FPGA: Automated Acceleration of SQL Query Processing on Modern CPU-FPGA Platforms A Lu, J Narendra Agrawal, Z Fang ACM Transactions on Reconfigurable Technology and Systems 17 (3), 1-28, 2024 | 8 | 2024 |
CHIP-KNNv2: A Configurable and High-Performance K-Nearest Neighbors Accelerator on HBM-based FPGAs K Liu, A Lu, K Samtani, Z Fang, L Guo ACM Transactions on Reconfigurable Technology and Systems 16 (4), 1-26, 2023 | 7 | 2023 |
SASA: A scalable and automatic stencil acceleration framework for optimized hybrid spatial and temporal parallelism on HBM-based FPGAs X Tian, Z Ye, A Lu, L Guo, Y Chi, Z Fang ACM Transactions on Reconfigurable Technology and Systems 16 (2), 1-33, 2023 | 7 | 2023 |
FPGA-aware automatic acceleration framework for vision transformer with mixed-scheme quantization: Late breaking results M Sun, Z Li, A Lu, H Ma, G Yuan, Y Xie, H Tang, Y Li, M Leeser, Z Wang, ... Proceedings of the 59th ACM/IEEE Design Automation Conference, 1394-1395, 2022 | 7 | 2022 |
Demystifying the soft and hardened memory systems of modern fpgas for software programmers through Microbenchmarking A Lu, Z Fang, L Shannon ACM Transactions on Reconfigurable Technology and Systems (TRETS) 15 (4), 1-33, 2022 | 7 | 2022 |
Quasar-vit: Hardware-oriented quantization-aware architecture search for vision transformers Z Li, A Lu, Y Xie, Z Kong, M Sun, H Tang, ZJ Xue, P Dong, C Ding, ... Proceedings of the 38th ACM International Conference on Supercomputing, 324-337, 2024 | 5 | 2024 |
You already have it: A generator-free low-precision dnn training framework using stochastic rounding G Yuan, SE Chang, Q Jin, A Lu, Y Li, Y Wu, Z Kong, Y Xie, P Dong, M Qin, ... European Conference on Computer Vision, 34-51, 2022 | 4 | 2022 |
ESRU: Extremely Low-Bit and Hardware-Efficient Stochastic Rounding Unit Design for Low-Bit DNN Training SE Chang, G Yuan, A Lu, M Sun, Y Li, X Ma, Z Li, Y Xie, M Qin, X Lin, ... 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), 1-6, 2023 | 3 | 2023 |
Quick-Div: Rethinking Integer Divider Design for FPGA-based Soft-processors E Matthews, A Lu, Z Fang, L Shannon ACM Transactions on Reconfigurable Technology and Systems (TRETS) 15 (3), 1-27, 2022 | 2 | 2022 |
Late Breaking Results: FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization M Sun, Z Li, A Lu, H Ma, G Yuan, Y Xie, H Tang, Y Li, M Leeser, Z Wang, ... Proceedings of the 59th Design Automation Conference (DAC), 2022 | 2 | 2022 |
BitBlender: Scalable Bloom Filter Acceleration on FPGAs with Dynamic Scheduling K Liu, A Lu, Z Fang 2024 34th International Conference on Field-Programmable Logic and …, 2024 | | 2024 |
FORC: A High-Throughput Streaming FPGA Accelerator for Optimized Row Columnar File Decoders in Big Data Engines A Wadood, A Lu, K Zhang, Z Fang 2024 34th International Conference on Field-Programmable Logic and …, 2024 | | 2024 |
Hardware-efficient stochastic rounding unit design for DNN training: late breaking results SE Chang, G Yuan, A Lu, M Sun, Y Li, X Ma, Z Li, Y Xie, M Qin, X Lin, ... Proceedings of the 59th ACM/IEEE Design Automation Conference, 1396-1397, 2022 | | 2022 |