Caption anything: Interactive image description with diverse multimodal controls T Wang*, J Zhang*, J Fei*, H Zheng, Y Tang, Z Li, M Gao, S Zhao arXiv preprint arXiv:2305.02677, 2023 | 88 | 2023 |
Transferable decoding with visual entities for zero-shot image captioning J Fei*, T Wang*, J Zhang, Z He, C Wang, F Zheng Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 39 | 2023 |
Learning grounded vision-language representation for versatile understanding in untrimmed videos T Wang*, J Zhang*, F Zheng, W Jiang, R Cheng, P Luo arXiv preprint arXiv:2303.06378, 2023 | 12 | 2023 |
Llmva-gebc: Large language model with video adapter for generic event boundary captioning Y Tang, J Zhang, X Wang, T Wang, F Zheng arXiv preprint arXiv:2306.10354, 2023 | 9 | 2023 |
Reflective instruction tuning: Mitigating hallucinations in large vision-language models J Zhang, T Wang, H Zhang, P Lu, F Zheng European Conference on Computer Vision, 196-213, 2024 | 3 | 2024 |
Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage Progressive Training Z Liu, T Wang, J Zhang, F Zheng, W Jiang, K Lu IEEE Transactions on Multimedia 25, 7894-7905, 2022 | 3 | 2022 |
Exploiting context information for generic event boundary captioning J Zhang, T Wang, F Zheng, R Cheng, P Luo arXiv preprint arXiv:2207.01050, 2022 | 2 | 2022 |
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos T Geng, J Zhang, Q Wang, T Wang, J Duan, F Zheng arXiv preprint arXiv:2411.19772, 2024 | 1 | 2024 |