How to Protect Copyright Data in Optimization of Large Language Models? T Chu, Z Song, C Yang Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17871 …, 2024 | 42 | 2024 |
Towards Infinite-Long Prefix in Transformer Y Liang, Z Shi, Z Song, C Yang arXiv preprint arXiv:2406.14036, 2024 | 17* | 2024 |
Fine-tune language models to approximate unbiased in-context learning T Chu, Z Song, C Yang arXiv preprint arXiv:2310.03331, 2023 | 16 | 2023 |
Unmasking transformers: A theoretical approach to data recovery via attention weights Y Deng, Z Song, S Xie, C Yang arXiv preprint arXiv:2310.12462, 2023 | 10 | 2023 |
An automatic learning rate schedule algorithm for achieving faster convergence and steeper descent Z Song, C Yang arXiv preprint arXiv:2310.11291, 2023 | 8 | 2023 |
Attention is Naturally Sparse with Gaussian Distributed Input Y Deng, Z Song, C Yang arXiv preprint arXiv:2404.02690, 2024 | 7 | 2024 |
A theoretical insight into attack and defense of gradient leakage in transformer C Li, Z Song, W Wang, C Yang arXiv preprint arXiv:2311.13624, 2023 | 6 | 2023 |
Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond Y Ke, Y Liang, Z Shi, Z Song, C Yang arXiv preprint arXiv:2412.06061, 2024 | 2 | 2024 |
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space R Addanki, C Li, Z Song, C Yang arXiv preprint arXiv:2311.14652, 2023 | 2 | 2023 |
Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence Y Deng, Z Song, C Yang arXiv preprint arXiv:2402.01515, 2024 | 1 | 2024 |
Unlocking the Theory Behind Scaling 1-Bit Neural Networks M Daliri, Z Song, C Yang arXiv preprint arXiv:2411.01663, 2024 | | 2024 |