[HTML][HTML] Multimodal large language models in health care: applications, challenges, and future outlook

R AlSaad, A Abd-Alrazaq, S Boughorbel… - Journal of medical …, 2024 - jmir.org
In the complex and multidimensional field of medicine, multimodal data are prevalent and
crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types …

Et bench: Towards open-ended event-level video-language understanding

Y Liu, Z Ma, Z Qi, Y Wu, Y Shan, CW Chen - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in Video Large Language Models (Video-LLMs) have demonstrated their
great potential in general-purpose video understanding. To verify the significance of these …

Video Question Answering: A survey of the state-of-the-art

PJ Jeshmol, BC Kovoor - Journal of Visual Communication and Image …, 2024 - Elsevier
Abstract Video Question Answering (VideoQA) emerges as a prominent trend in the domain
of Artificial Intelligence, Computer Vision, and Natural Language Processing. It involves …