To find where you talk: Temporal sentence localization in video with attention based location regression Y Yuan, T Mei, W Zhu Proceedings of the AAAI Conference on Artificial Intelligence 33 (01), 9159-9166, 2019 | 364 | 2019 |
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos Y Yuan, L Ma, J Wang, W Liu, W Zhu IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (5), 2725 …, 2022 | 276 | 2022 |
Video summarization by learning deep side semantic embedding Y Yuan, T Mei, P Cui, W Zhu IEEE Transactions on Circuits and Systems for Video Technology 29 (1), 226-237, 2017 | 109 | 2017 |
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric Y Yuan, X Lan, X Wang, L Chen, Z Wang, W Zhu The 2nd International Workshop on Human-centric Multimedia Analysis (HUMA '21), 2021 | 72 | 2021 |
A Survey on Temporal Sentence Grounding in Videos X Lan, Y Yuan, X Wang, W Zhu ACM Transactions on Multimedia Computing, Communications and Applications 19 …, 2023 | 65 | 2023 |
Cross-modal dual learning for sentence-to-video generation Y Liu, X Wang, Y Yuan, W Zhu Proceedings of the 27th ACM international conference on multimedia, 1239-1247, 2019 | 39 | 2019 |
Sentence specified dynamic video thumbnail generation Y Yuan, L Ma, W Zhu Proceedings of the 27th ACM international conference on multimedia, 2332-2340, 2019 | 32 | 2019 |
Controllable video captioning with an exemplar sentence Y Yuan, L Ma, J Wang, W Zhu Proceedings of the 28th ACM International Conference on Multimedia, 1085-1093, 2020 | 22 | 2020 |
Curriculum multi-negative augmentation for debiased video grounding X Lan, Y Yuan, H Chen, X Wang, Z Jie, L Ma, Z Wang, W Zhu Proceedings of the AAAI Conference on Artificial Intelligence 37 (1), 1213-1221, 2023 | 17 | 2023 |
Syntax Customized Video Captioning by Imitating Exemplar Sentences Y Yuan, L Ma, W Zhu IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (12 …, 2021 | 7 | 2021 |
Timemarker: A versatile video-llm for long and short video understanding with superior temporal localization ability S Chen, X Lan, Y Yuan, Z Jie, L Ma arXiv preprint arXiv:2411.18211, 2024 | 5 | 2024 |
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment X Xu, Y Yuan, Q Zhang, W Wu, Z Jie, L Ma, X Wang arXiv preprint arXiv:2312.09625, 2023 | 5 | 2023 |
Vidcompress: Memory-enhanced temporal compression for video understanding in large language models X Lan, Y Yuan, Z Jie, L Ma arXiv preprint arXiv:2410.11417, 2024 | 3 | 2024 |
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models S Chen, Y Yuan, S Chen, Z Jie, L Ma arXiv preprint arXiv:2406.08024, 2024 | 2 | 2024 |
3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance X Xu, Y Yuan, J Li, Q Zhang, Z Jie, L Ma, H Tang, N Sebe, X Wang ECCV 2024, 2024 | 1 | 2024 |
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach X Lan, Y Yuan, X Wang, L Chen, Z Wang, L Ma, W Zhu ACM Transactions on Multimedia Computing, Communications, and Applications …, 2023 | | 2023 |
Supplementary Materials for 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance X Xu, Y Yuan, J Li, Q Zhang, Z Jie, L Ma, H Tang, N Sebe, X Wang | | |