Bevformer v2: Adapting modern image backbones to bird's-eye-view recognition via perspective supervision C Yang, Y Chen, H Tian, C Tao, X Zhu, Z Zhang, G Huang, H Li, Y Qiao, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 263 | 2023 |
Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory X Zhu, Y Chen, H Tian, C Tao, W Su, C Yang, G Huang, B Li, L Lu, ... arXiv preprint arXiv:2305.17144, 2023 | 192* | 2023 |
Elegant: Exquisite and locally editable gan for makeup transfer C Yang, W He, Y Xu, Y Gao European Conference on Computer Vision, 737-754, 2022 | 37 | 2022 |
Faceclip: Facial image-to-video translation via a brief text description J Guo, H Manukyan, C Yang, C Wang, L Khachatryan, S Navasardyan, ... IEEE Transactions on Circuits and Systems for Video Technology, 2023 | 6 | 2023 |
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models C Yang, X Dong, X Zhu, W Su, J Wang, H Tian, Z Chen, W Wang, L Lu, ... arXiv preprint arXiv:2412.09613, 2024 | | 2024 |
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning C Yang, X Zhu, J Zhu, W Su, J Wang, X Dong, W Wang, L Lu, B Li, J Zhou, ... arXiv preprint arXiv:2406.07543, 2024 | | 2024 |