Omnivl: One foundation model for image-language and video-language tasks J Wang, D Chen, Z Wu, C Luo, L Zhou, Y Zhao, Y Xie, C Liu, YG Jiang, ... Advances in neural information processing systems 35, 5696-5710, 2022 | 149 | 2022 |
A battle of network structures: An empirical study of cnn, transformer, and mlp Y Zhao, G Wang, C Tang, C Luo, W Zeng, ZJ Zha arXiv preprint arXiv:2108.13002, 2021 | 116 | 2021 |
Sparse MLP for image recognition: Is self-attention really necessary? C Tang, Y Zhao, G Wang, C Luo, W Xie, W Zeng Proceedings of the AAAI conference on artificial intelligence 36 (2), 2344-2351, 2022 | 107 | 2022 |
When shift operation meets vision transformer: An extremely simple alternative to attention mechanism G Wang, Y Zhao, C Tang, C Luo, W Zeng Proceedings of the AAAI Conference on Artificial Intelligence 36 (2), 2423-2430, 2022 | 75 | 2022 |
Look before you match: Instance understanding matters in video object segmentation J Wang, D Chen, Z Wu, C Luo, C Tang, X Dai, Y Zhao, Y Xie, L Yuan, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 50 | 2023 |
Self-supervised visual representations learning by contrastive mask prediction Y Zhao, G Wang, C Luo, W Zeng, ZJ Zha Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 48 | 2021 |
Adriver-i: A general world model for autonomous driving F Jia, W Mao, Y Liu, Y Zhao, Y Wen, C Zhang, X Zhang, T Wang arXiv preprint arXiv:2311.13549, 2023 | 43 | 2023 |
Peripheral vision transformer J Min, Y Zhao, C Luo, M Cho Advances in Neural Information Processing Systems 35, 32097-32111, 2022 | 36 | 2022 |
Panacea: Panoramic and controllable video generation for autonomous driving Y Wen, Y Zhao, Y Liu, F Jia, Y Wang, C Luo, C Zhang, T Wang, X Sun, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 31 | 2024 |
Multi-scale group transformer for long sequence modeling in speech separation Y Zhao, C Luo, ZJ Zha, W Zeng Proceedings of the Twenty-Ninth International Conference on International …, 2021 | 15 | 2021 |
Streaming video model Y Zhao, C Luo, C Tang, D Chen, N Codella, ZJ Zha Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 14 | 2023 |
Retrievertts: Modeling decomposed factors for text-based speech insertion D Yin, C Tang, Y Liu, X Wang, Z Zhao, Y Zhao, Z Xiong, S Zhao, C Luo arXiv preprint arXiv:2206.13865, 2022 | 14 | 2022 |
Stream Query Denoising for Vectorized HD-Map Construction S Wang, F Jia, W Mao, Y Liu, Y Zhao, Z Chen, T Wang, C Zhang, X Zhang, ... European Conference on Computer Vision, 203-220, 2024 | 12 | 2024 |
Zero-shot text-to-speech for text-based insertion in audio narration C Tang, C Luo, Z Zhao, D Yin, Y Zhao, W Zeng arXiv preprint arXiv:2109.05426, 2021 | 9 | 2021 |
General-purpose speech representation learning through a self-supervised multi-granularity framework Y Zhao, D Yin, C Luo, Z Zhao, C Tang, W Zeng, ZJ Zha arXiv preprint arXiv:2102.01930, 2021 | 8 | 2021 |
Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? Y Bai, D Wu, Y Liu, F Jia, W Mao, Z Zhang, Y Zhao, J Shen, X Wei, ... arXiv preprint arXiv:2405.18361, 2024 | 7 | 2024 |
Subjectdrive: Scaling generative data in autonomous driving via subject control B Huang, Y Wen, Y Zhao, Y Hu, Y Liu, F Jia, W Mao, T Wang, C Zhang, ... arXiv preprint arXiv:2403.19438, 2024 | 5 | 2024 |
VLM-Eval: A General Evaluation on Video Large Language Models S Li, Y Zhang, Y Zhao, Q Wang, F Jia, Y Liu, T Wang arXiv preprint arXiv:2311.11865, 2023 | 3 | 2023 |
Reconstructive visual instruction tuning H Wang, A Zheng, Y Zhao, T Wang, Z Ge, X Zhang, Z Zhang arXiv preprint arXiv:2410.09575, 2024 | 2 | 2024 |
Attention-Guided Contrastive Masked Image Modeling for Transformer-Based Self-Supervised Learning Y Zhan, Y Zhao, C Luo, Y Zhang, X Sun 2023 IEEE International Conference on Image Processing (ICIP), 2490-2494, 2023 | 1 | 2023 |