GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

W Wu, H Yao, M Zhang, Y Song, W Ouyang… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …

Uncertainty-aware sign language video retrieval with probability distribution modeling

X Wu, H Li, Y Luo, X Cheng, X Zhuang, M Cao… - European Conference on …, 2024 - Springer
Sign language video retrieval plays a key role in facilitating information access for the deaf
community. Despite significant advances in video-text retrieval, the complexity and inherent …

Freeva: Offline mllm as training-free video assistant

W Wu - arxiv preprint arxiv:2405.07798, 2024 - arxiv.org
This paper undertakes an empirical study to revisit the latest advancements in Multimodal
Large Language Models (MLLMs): Video Assistant. This study, namely FreeVA, aims to …