Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference H Dong, X Yang, Z Zhang, Z Wang, Y Chi, B Chen International Conference on Machine Learning (ICML), 2024 | 40* | 2024 |
Fast and provable tensor robust principal component analysis via scaled gradient descent H Dong, T Tong, C Ma, Y Chi Information and Inference: A Journal of the IMA 12 (3), 1716-1758, 2023 | 17 | 2023 |
Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation H Dong, B Chen, Y Chi Conference on Language Modeling (COLM), 2024 | 12* | 2024 |
Shadowkv: Kv cache in shadows for high-throughput long-context llm inference H Sun, LW Chang, W Bao, S Zheng, N Zheng, X Liu, H Dong, Y Chi, ... arXiv preprint arXiv:2410.21465, 2024 | 11* | 2024 |
Deep unfolded tensor robust PCA with self-supervised learning H Dong, M Shah, S Donegan, Y Chi ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 7 | 2023 |
Towards structured sparsity in transformers for efficient inference H Dong, B Chen, Y Chi Workshop on Efficient Systems for Foundation Models@ ICML2023, 2023 | 6 | 2023 |
A lightweight transformer for faster and robust EBSD data collection H Dong, S Donegan, M Shah, Y Chi Scientific Reports 13 (1), 21253, 2023 | 2 | 2023 |
Learning optimal traffic routing behaviors using Markovian framework in microscopic simulation T Cabannes, J Li, F Wu, H Dong, AM Bayen Transportation Review Board Annual Meeting 2020, 2020 | 1 | 2020 |
Leveraging Multimodal Diffusion Models to Accelerate Imaging with Side Information T Efimov, H Dong, M Shah, J Simmons, S Donegan, Y Chi International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 | | 2025 |
Towards Low-bit Communication for Tensor Parallel LLM Inference H Dong, T Johnson, M Cho, E Soroush NeurIPS Workshop on Efficient Natural Language and Speech Processing IV, 2024 | | 2024 |