Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models G Park, B Park, M Kim, S Lee, J Kim, B Kwon, SJ Kwon, B Kim, Y Lee, ... arXiv preprint arXiv:2206.09557, 2023 | 123 | 2023 |
Design and analysis of approximate compressors for balanced error accumulation in mac operator G Park, J Kung, Y Lee IEEE Transactions on Circuits and Systems I: Regular Papers 68 (7), 2950-2961, 2021 | 45 | 2021 |
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization JY Yang, B Kim, J Bae, B Kwon, G Park, E Yang, SJ Kwon, D Lee arXiv preprint arXiv:2402.18096, 2024 | 27 | 2024 |
Simplified Compressor and Encoder Designs for Low-Cost Approximate Radix-4 Booth Multiplier G Park, J Kung, Y Lee IEEE Transactions on Circuits and Systems II: Express Briefs 70 (3), 1154-1158, 2022 | 19 | 2022 |
Energy-Efficient RISC-V-Based Vector Processor for Cache-Aware Structurally-Pruned Transformers JG Min, D Kam, Y Byun, G Park, Y Lee 2023 IEEE/ACM International Symposium on Low Power Electronics and Design …, 2023 | 5 | 2023 |
TF-MVP: Novel Sparsity-Aware Transformer Accelerator with Mixed-Length Vector Pruning E Yoo, G Park, JG Min, SJ Kwon, B Park, D Lee, Y Lee 2023 60th ACM/IEEE Design Automation Conference (DAC), 1-6, 2023 | 5 | 2023 |
Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models Y Byun, S Moon, B Park, SJ Kwon, D Lee, G Park, E Yoo, JG Min, Y Lee Proceedings of Machine Learning and Systems 5, 2023 | 3 | 2023 |
Low-Power Encoder and Compressor Design for Approximate Radix-8 Booth Multiplier J Kim, G Park, Y Lee 2024 IEEE International Symposium on Circuits and Systems (ISCAS), 1-5, 2024 | 1 | 2024 |
An Investigation of FP8 Across Accelerators for LLM Inference J Kim, J Lee, G Park, B Kim, SJ Kwon, D Lee, Y Lee arXiv preprint arXiv:2502.01070, 2025 | | 2025 |
nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models G Park, B Park, SJ Kwon, B Kim, Y Lee, D Lee arXiv preprint arXiv:2206.09557, 2022 | | 2022 |