Mict: Mixed 3d/2d convolutional tube for human action recognition Y Zhou, X Sun, ZJ Zha, W Zeng Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 289 | 2018 |
One-Shot Neural Architecture Search Through A Posteriori Distribution Guided Sampling Y Zhou, X Sun, C Luo, ZJ Zha, W Zeng arXiv preprint arXiv:1906.09557, 2019 | 200* | 2019 |
Context-reinforced semantic segmentation Y Zhou, X Sun, ZJ Zha, W Zeng Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 61 | 2019 |
Adaptive pooling in multi-instance learning for web video annotation Y Zhou, X Sun, D Liu, Z Zha, W Zeng Proceedings of the IEEE International Conference on Computer Vision …, 2017 | 57 | 2017 |
Spatiotemporal Fusion in 3D CNNs: A Probabilistic View Y Zhou, X Sun, C Luo, ZJ Zha, W Zeng Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020 | 35 | 2020 |
Unsupervised visual representation learning by tracking patches in video G Wang, Y Zhou, C Luo, W Xie, W Zeng, Z Xiong Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 29 | 2021 |
Inter-x: Towards versatile human-human interaction analysis L Xu, X Lv, Y Yan, X Jin, S Wu, C Xu, Y Liu, Y Zhou, F Rao, X Sheng, Y Liu, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 18 | 2024 |
Image captioning with multi-context synthetic data F Ma, Y Zhou, F Rao, Y Zhang, X Sun Proceedings of the AAAI Conference on Artificial Intelligence 38 (5), 4089-4097, 2024 | 9 | 2024 |
Posterior-Guided Neural Architecture Search Y Zhou, X Sun, C Luo, ZJ Zha, W Zeng Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 2020 | 8 | 2020 |
Regennet: Towards human action-reaction synthesis L Xu, Y Zhou, Y Yan, X Jin, W Zhu, F Rao, X Yang, W Zeng Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 7 | 2024 |
Distribution consistent neural architecture search J Pan, C Sun, Y Zhou, Y Zhang, C Li Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 7 | 2022 |
Visual perception by large language model’s weights F Ma, H Xue, Y Zhou, G Wang, F Rao, S Yan, Y Zhang, S Wu, MZ Shou, ... Advances in Neural Information Processing Systems 37, 28615-28635, 2025 | 5 | 2025 |
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling J Yang, D Yin, Y Zhou, F Rao, W Zhai, Y Cao, ZJ Zha arXiv preprint arXiv:2410.10798, 2024 | 4 | 2024 |
Number it: Temporal Grounding Videos like Flipping Manga Y Wu, X Hu, Y Sun, Y Zhou, W Zhu, F Rao, B Schiele, X Yang arXiv preprint arXiv:2411.10332, 2024 | 2 | 2024 |
Ee-mllm: A data-efficient and compute-efficient multimodal large language model F Ma, Y Zhou, H Li, Z He, S Wu, F Rao, Y Zhang, X Sun arXiv preprint arXiv:2408.11795, 2024 | 2 | 2024 |
VAE^ 2: Preventing Posterior Collapse of Variational Video Predictions in the Wild Y Zhou, C Luo, X Sun, ZJ Zha, W Zeng arXiv preprint arXiv:2101.12050, 2021 | 2 | 2021 |
Multi-Modal Generative Embedding Model F Ma, H Xue, G Wang, Y Zhou, F Rao, S Yan, Y Zhang, S Wu, MZ Shou, ... arXiv preprint arXiv:2405.19333, 2024 | | 2024 |
Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models F Ma, Y Zhou, Y Zhang, S Wu, Z Zhang, Z He, F Rao, X Sun Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | | 2024 |
Text-Only Image Captioning with Multi-Context Data Generation. F Ma, Y Zhou, F Rao, Y Zhang, X Sun CoRR, 2023 | | 2023 |
ReGenNet: Towards Human Action-Reaction Synthesis* Appendix L Xu, Y Zhou, Y Yan, X Jin, W Zhu, F Rao, X Yang, W Zeng | | |