Exploiting hardware utilization and adaptive dataflow for efficient sparse convolution in 3D point clouds K Hong, Z Yu, G Dai, X Yang, Y Lian, N Xu, Y Wang Proceedings of Machine Learning and Systems 5, 428-441, 2023 | 10 | 2023 |
Large language model inference acceleration: A comprehensive hardware perspective J Li, J Xu, S Huang, Y Chen, W Li, J Liu, Y Lian, J Pan, L Ding, H Zhou, ... arXiv preprint arXiv:2410.04466, 2024 | 6 | 2024 |
Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization J Li, S Li, J Xu, S Huang, Y Lian, J Liu, Y Wang, G Dai arXiv preprint arXiv:2311.16442, 2023 | 2 | 2023 |
A Point Transformer Accelerator with Fine-Grained Pipelines and Distribution-Aware Dynamic FPS Y Lian, X Yang, K Hong, Y Wang, G Dai, N Xu 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 1-9, 2023 | 1 | 2023 |
A Point Transformer Accelerator With Distribution-Aware Heuristic Distance Calculation Y Lian, X Yang, K Hong, Y Wang, N Xu, G Dai IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024 | | 2024 |
Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization J Li, J Xu13, S Li23, S Huang, J Liu, Y Lian, G Dai13 | | 2024 |