LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model P Gao, J Han, R Zhang, Z Lin, S Geng, A Zhou, W Zhang, P Lu, C He, ... arXiv preprint arXiv:2304.15010, 2023 | 539 | 2023 |
Avatar-net: Multi-scale zero-shot style transfer by feature decoration L Sheng, Z Lin, J Shao, X Wang Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2018 | 366 | 2018 |
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning J Pan*, Z Lin*, X Zhu, J Shao, H Li Advances in Neural Information Processing Systems 35, 26462-26477, 2022 | 252 | 2022 |
Frozen clip models are efficient video learners Z Lin, S Geng, R Zhang, P Gao, G de Melo, X Wang, J Dai, Y Qiao, H Li European Conference on Computer Vision, 388-404, 2022 | 237 | 2022 |
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models Z Lin, D Liu, R Zhang, P Gao, L Qiu, H Xiao, H Qiu, W Shao, K Chen, ... European Conference on Computer Vision, 36-55, 2025 | 222* | 2025 |
MCMAE: Masked convolution meets masked autoencoders P Gao, T Ma, H Li, Z Lin, J Dai, Y Qiao Advances in Neural Information Processing Systems 35, 35632-35644, 2022 | 191* | 2022 |
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models D Liu, R Zhang, L Qiu, S Huang, W Lin, S Zhao, S Geng, Z Lin, P Jin, ... Forty-first International Conference on Machine Learning, 0 | 95* | |
Retrieving-to-answer: Zero-shot video question answering with frozen large language models J Pan*, Z Lin*, Y Ge, X Zhu, R Zhang, Y Wang, Y Qiao, H Li Proceedings of the IEEE/CVF International Conference on Computer Vision, 272-283, 2023 | 26 | 2023 |
Mimic before reconstruct: Enhancing masked autoencoders with feature mimicking P Gao, Z Lin, R Zhang, R Fang, H Li, H Li, Y Qiao International Journal of Computer Vision 132 (5), 1546-1556, 2024 | 18 | 2024 |
1st place solution for ava-kinetics crossover in acitivitynet challenge 2020 S Chen, J Pan, G Song, M Zhang, H Shao, Z Lin, J Shao, H Li, Y Liu arXiv preprint arXiv:2006.09116, 2020 | 6 | 2020 |
ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model X Chi, Y Liu, Z Jiang, R Zhang, Z Lin, R Zhang, P Gao, C Fu, S Zhang, ... arXiv preprint arXiv:2311.17963, 2023 | 3 | 2023 |
TerDiT: Ternary Diffusion Models with Transformers X Lu, A Zhou, Z Lin, Q Liu, Y Xu, R Zhang, Y Wen, S Ren, P Gao, J Yan, ... arXiv preprint arXiv:2405.14854, 2024 | | 2024 |