Phi-3 technical report: A highly capable language model locally on your phone M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ... arXiv preprint arXiv:2404.14219, 2024 | 757 | 2024 |
Bond: Bert-assisted open-domain named entity recognition with distant supervision C Liang, Y Yu, H Jiang, S Er, R Wang, T Zhao, C Zhang Proceedings of the 26th ACM SIGKDD international conference on knowledge …, 2020 | 282 | 2020 |
Loftq: Lora-fine-tuning-aware quantization for large language models Y Li, Y Yu, C Liang, P He, N Karampatziakis, W Chen, T Zhao arXiv preprint arXiv:2310.08659, 2023 | 140 | 2023 |
Platon: Pruning large transformer models with upper confidence bound of weight importance Q Zhang, S Zuo, C Liang, A Bukharin, P He, W Chen, T Zhao International conference on machine learning, 26809-26823, 2022 | 90 | 2022 |
Losparse: Structured compression of large language models based on low-rank and sparse approximation Y Li, Y Yu, Q Zhang, C Liang, P He, W Chen, T Zhao International Conference on Machine Learning, 20336-20350, 2023 | 74 | 2023 |
Less is more: Task-aware layer-wise distillation for language model compression C Liang, S Zuo, Q Zhang, P He, W Chen, T Zhao International Conference on Machine Learning, 20852-20867, 2023 | 74 | 2023 |
Super tickets in pre-trained language models: From model compression to improving generalization C Liang, S Zuo, M Chen, H Jiang, X Liu, P He, T Zhao, W Chen arXiv preprint arXiv:2105.12002, 2021 | 62 | 2021 |
Moebert: from bert to mixture-of-experts via importance-guided adaptation S Zuo, Q Zhang, C Liang, P He, T Zhao, W Chen arXiv preprint arXiv:2204.07675, 2022 | 52 | 2022 |
Multi-domain neural machine translation with word-level adaptive layer-wise domain mixing H Jiang, C Liang, C Wang, T Zhao arXiv preprint arXiv:1911.02692, 2019 | 35 | 2019 |
Homodistil: Homotopic task-agnostic distillation of pre-trained transformers C Liang, H Jiang, Z Li, X Tang, B Yin, T Zhao arXiv preprint arXiv:2302.09632, 2023 | 31 | 2023 |
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling L Ren, Y Liu, Y Lu, Y Shen, C Liang, W Chen arXiv preprint arXiv:2406.07522, 2024 | 29 | 2024 |
A fully convolutional tri-branch network (fctn) for domain adaptation J Zhang, C Liang, CCJ Kuo 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 26 | 2018 |
Phi-3 technical report: A highly capable language model locally on your phone, 2024 M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ... URL https://arxiv. org/abs/2404.14219, 2024 | 24 | 2024 |
No parameters left behind: Sensitivity guided adaptive learning rate for training large transformer models C Liang, H Jiang, S Zuo, P He, X Liu, J Gao, W Chen, T Zhao arXiv preprint arXiv:2202.02664, 2022 | 16 | 2022 |
Self-training with differentiable teacher S Zuo, Y Yu, C Liang, H Jiang, S Er, C Zhang, T Zhao, H Zha arXiv preprint arXiv:2109.07049, 2021 | 15 | 2021 |
Adversarial regularization as stackelberg game: An unrolled optimization approach S Zuo, C Liang, H Jiang, X Liu, P He, J Gao, W Chen, T Zhao arXiv preprint arXiv:2104.04886, 2021 | 10 | 2021 |
Module-wise adaptive distillation for multimodality foundation models C Liang, J Yu, MH Yang, M Brown, Y Cui, T Zhao, B Gong, T Zhou Advances in Neural Information Processing Systems 36, 2024 | 8 | 2024 |
Camero: Consistency regularized ensemble of perturbed language models with weight sharing C Liang, P He, Y Shen, W Chen, T Zhao arXiv preprint arXiv:2204.06625, 2022 | 6 | 2022 |
Adversarial training as stackelberg game: An unrolled optimization approach S Zuo, C Liang, H Jiang, X Liu, P He, J Gao, W Chen, T Zhao arXiv preprint arXiv:2104.04886, 2021 | 6 | 2021 |
Grin: Gradient-informed moe L Liu, YJ Kim, S Wang, C Liang, Y Shen, H Cheng, X Liu, M Tanaka, X Wu, ... arXiv preprint arXiv:2409.12136, 2024 | 4 | 2024 |