Semantic-conditional diffusion networks for image captioning J Luo, Y Li, Y Pan, T Yao, J Feng, H Chao, T Mei Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 87 | 2023 |
Auto-captions on GIF: A large-scale video-sentence dataset for vision-language pre-training Y Pan, Y Li, J Luo, J Xu, T Yao, T Mei Proceedings of the 30th ACM International Conference on Multimedia, 7070-7074, 2022 | 68 | 2022 |
CoCo-BERT: Improving video-language pre-training with contrastive cross-modal matching and denoising J Luo, Y Li, Y Pan, T Yao, H Chao, T Mei Proceedings of the 29th ACM International Conference on Multimedia, 5600-5608, 2021 | 49 | 2021 |
Boosting vision-and-language navigation with direction guiding and backtracing J Chen, J Luo, Y Pan, Y Li, T Yao, H Chao, T Mei ACM Transactions on Multimedia Computing, Communications and Applications 19 …, 2023 | 8 | 2023 |
Exploring vision-language foundation model for novel object captioning J Luo, Y Li, Y Pan, T Yao, J Feng, H Chao, T Mei IEEE Transactions on Circuits and Systems for Video Technology, 2024 | 1 | 2024 |
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning J Luo, J Chen, Y Li, Y Pan, J Feng, H Chao, T Yao European Conference on Computer Vision, 237-254, 2024 | | 2024 |