X-clip: End-to-end multi-grained contrastive learning for video-text retrieval Y Ma, G Xu, X Sun, M Yan, J Zhang, R Ji ACM MM 2022, 638-647, 2022 | 268 | 2022 |
Towards local visual modeling for image captioning Y Ma, J Ji, X Sun, Y Zhou, R Ji Pattern Recognition (PR) 138, 109420, 2023 | 74 | 2023 |
Knowing what to learn: a metric-oriented focal mechanism for image captioning J Ji, Y Ma, X Sun, Y Zhou, Y Wu, R Ji IEEE Transactions on Image Processing 31, 4321-4335, 2022 | 43 | 2022 |
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance Y Ma, X Zhang, X Sun, J Ji, H Wang, G Jiang, W Zhuang, R Ji ICCV 2023, 2749-2760, 2023 | 38 | 2023 |
Rotated multi-scale interaction network for referring remote sensing image segmentation S Liu, Y Ma, X Zhang, H Wang, J Ji, X Sun, R Ji CVPR 2024, 26658-26668, 2024 | 32 | 2024 |
Knowing what it is: semantic-enhanced dual attention transformer Y Ma, J Ji, X Sun, Y Zhou, Y Wu, F Huang, R Ji IEEE Transactions on Multimedia (IEEE TMM), 2022 | 26 | 2022 |
3d-stmn: Dependency-driven superpoint-text matching network for end-to-end 3d referring expression segmentation C Wu, Y Ma, Q Chen, H Wang, G Luo, J Ji, X Sun AAAI 2024 38 (6), 5940-5948, 2024 | 17 | 2024 |
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval Y Ma, X Sun, J Ji, G Jiang, W Zhuang, R Ji ACM MM 2023, 4157-4168, 2023 | 15 | 2023 |
X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks Z Qian, Y Ma, J Ji, X Sun AAAI 2024 38 (5), 4551-4559, 2024 | 13 | 2024 |
Beyond first impressions: Integrating joint multi-modal cues for comprehensive 3d representation H Wang, J Tang, J Ji, X Sun, R Zhang, Y Ma, M Zhao, L Li, Z Zhao, T Lv, ... ACM MM 2023, 3403-3414, 2023 | 12 | 2023 |
Creating High-quality 3D Content by Bridging the Gap Between Text-to-2D and Text-to-3D Generation Y Ma, Y Fan, J Ji, H Wang, H Yin, X Sun, R Ji ACM Transactions on Multimedia Computing, Communications and Applications, 2024 | 9* | 2024 |
Semi-supervised panoptic narrative grounding D Yang, J Ji, X Sun, H Wang, Y Li, Y Ma, R Ji ACM MM 2023, 7164-7174, 2023 | 8 | 2023 |
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation D Yang, J Ji, Y Ma, T Guo, H Wang, X Sun, R Ji ICML 2024, 2024 | 5 | 2024 |
3D-GRES: Generalized 3D Referring Expression Segmentation C Wu, Y Liu, J Ji, Y Ma, H Wang, G Luo, H Ding, X Sun, R Ji ACM MM 2024, 2024 | 4 | 2024 |
Image Captioning via Dynamic Path Customization Y Ma, J Ji, X Sun, Y Zhou, X Hong, Y Wu, R Ji IEEE Transactions on Neural Networks and Learning System (TNNLS), 2024 | 3 | 2024 |
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation Y Ma, Z Lin, J Ji, Y Fan, X Sun, R Ji ICML 2024, 2024 | 3 | 2024 |
Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation T Guo, H Wang, Y Ma, J Ji, X Sun AAAI 2024 38 (3), 1985-1993, 2024 | 3 | 2024 |
JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues J Ji, H Wang, C Wu, Y Ma, X Sun, R Ji IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023 | 3 | 2023 |
AnyTrans: Translate AnyText in the Image with Large Scale Models Z Qian, P Zhang, B Yang, K Fan, Y Ma, DF Wong, X Sun, R Ji EMNLP 2024 (Findings), 2024 | 2 | 2024 |
RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation C Wu, Q Chen, J Ji, H Wang, Y Ma, Y Huang, G Luo, H Fei, X Sun, R Ji NeurIPS 2024, 2024 | 1 | 2024 |