- Academic Search

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-lan...

P Zhou, L Wang, Z Liu, Y Hao, P Hui, S Tarkoma… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper offers an insightful examination of how currently top-trending AI technologies, ie,
generative artificial intelligence (Generative AI) and large language models (LLMs), are …

Speichern Zitieren Zitiert von: 32 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Cap4video: What can auxiliary captions do for text-video retrieval?

W Wu, H Luo, B Fang, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Most existing text-video retrieval methods focus on cross-modal matching between the
visual content of videos and textual query sentences. However, in real-world scenarios …

Speichern Zitieren Zitiert von: 91 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] aaai.org

Revisiting classifier: Transferring vision-language models for video recognition

W Wu, Z Sun, W Ouyang - Proceedings of the AAAI conference on …, 2023 - ojs.aaai.org

Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …

Speichern Zitieren Zitiert von: 106 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[HTML] mdpi.com

[HTML][HTML] Deep learning innovations in video classification: A survey on techniques and dataset evaluations

M Mao, A Lee, M Hong - Electronics, 2024 - mdpi.com

Video classification has achieved remarkable success in recent years, driven by advanced
deep learning models that automatically categorize video content. This paper provides a …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 4 Versionen Im Cache

[Free GPT-4]

[PDF] arxiv.org

Ophnet: A large-scale video benchmark for ophthalmic surgical workflow understanding

M Hu, P **a, L Wang, S Yan, F Tang, Z Xu… - … on Computer Vision, 2024 - Springer

Surgical scene perception via videos is critical for advancing robotic surgery, telesurgery,
and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and …

Speichern Zitieren Zitiert von: 15 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] thecvf.com

Disentangling spatial and temporal learning for efficient image-to-video transfer learning

Z Qing, S Zhang, Z Huang, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently, large-scale pre-trained language-image models like CLIP have shown
extraordinary capabilities for understanding spatial contents, but naively transferring such …

Speichern Zitieren Zitiert von: 26 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Lana: A language-capable navigator for instruction following and generation

X Wang, W Wang, J Shao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Recently, visual-language navigation (VLN)--entailing robot agents to follow navigation
instructions--has shown great advance. However, existing literature put most emphasis on …

Speichern Zitieren Zitiert von: 38 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Alternating gradient descent and mixture-of-experts for integrated multimodal perception

H Akbari, D Kondratyuk, Y Cui… - Advances in …, 2023 - proceedings.neurips.cc

Abstract We present Integrated Multimodal Perception (IMP), a simple and scalable
multimodal multi-task training and modeling approach. IMP integrates multimodal inputs …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

W Wu, H Yao, M Zhang, Y Song, W Ouyang… - arxiv preprint arxiv …, 2023 - arxiv.org

This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …

Speichern Zitieren Zitiert von: 26 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

What Can Simple Arithmetic Operations Do for Temporal Modeling?

W Wu, Y Song, Z Sun, J Wang, C Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Temporal modeling plays a crucial role in understanding video content. To tackle this
problem, previous studies built complicated temporal relations through time sequence …

Speichern Zitieren Zitiert von: 10 Ähnliche Artikel Alle 5 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-lan...

A survey on generative ai and llm for video generation, understanding, and streaming

Cap4video: What can auxiliary captions do for text-video retrieval?

Revisiting classifier: Transferring vision-language models for video recognition

[HTML][HTML] Deep learning innovations in video classification: A survey on techniques and dataset evaluations

Ophnet: A large-scale video benchmark for ophthalmic surgical workflow understanding

Disentangling spatial and temporal learning for efficient image-to-video transfer learning

Lana: A language-capable navigator for instruction following and generation

Alternating gradient descent and mixture-of-experts for integrated multimodal perception

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

What Can Simple Arithmetic Operations Do for Temporal Modeling?