GPT4Vis: what can GPT-4 do for zero-shot visual recognition?
This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …
know baseline in light of the latest advancements in Generative Artificial Intelligence …
Uncertainty-aware sign language video retrieval with probability distribution modeling
Sign language video retrieval plays a key role in facilitating information access for the deaf
community. Despite significant advances in video-text retrieval, the complexity and inherent …
community. Despite significant advances in video-text retrieval, the complexity and inherent …
Freeva: Offline mllm as training-free video assistant
W Wu - arxiv preprint arxiv:2405.07798, 2024 - arxiv.org
This paper undertakes an empirical study to revisit the latest advancements in Multimodal
Large Language Models (MLLMs): Video Assistant. This study, namely FreeVA, aims to …
Large Language Models (MLLMs): Video Assistant. This study, namely FreeVA, aims to …