- Academic Search

J Zhang, S Wu, L Gao, HT Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning ie the
better the tuned model generalizes to the base (or target) task the worse it generalizes to …

Gem Citer Citeret af 28 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Embracing unimodal aleatoric uncertainty for robust multimodal fusion

Z Gao, X Jiang, X Xu, F Shen, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

As a fundamental problem in multimodal learning multimodal fusion aims to compensate for
the inherent limitations of a single modality. One challenge of multimodal fusion is that the …

Gem Citer Citeret af 5 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] archive.org

Joint searching and grounding: Multi-granularity video content retrieval

Z Chen, X Jiang, X Xu, Z Cao, Y Mo… - Proceedings of the 31st …, 2023 - dl.acm.org

Text-based video retrieval is a well-studied task aimed at retrieving relevant videos from a
large collection in response to a given text query. Most existing TVR works assume that …

Gem Citer Citeret af 16 Relaterede artikler Alle 2 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Faster video moment retrieval with point-level supervision

X Jiang, Z Zhou, X Xu, Y Yang, G Wang… - Proceedings of the 31st …, 2023 - dl.acm.org

Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an
untrimmed video with natural language queries. Existing VMR methods suffer from two …

Gem Citer Citeret af 16 Relaterede artikler Alle 3 versioner

Zero-shot video moment retrieval with angular reconstructive text embeddings

X Jiang, X Xu, Z Zhou, Y Yang, F Shen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at
retrieving a specific moment where the video content is semantically related to the text …

Gem Citer Citeret af 4 Relaterede artikler Alle 3 versioner

Joint objective and subjective fuzziness denoising for multimodal sentiment analysis

X Jiang, X Xu, H Lu, L He… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Multimodal Sentiment Analysis (MSA) aims at teaching computers or robotics to understand
human sentiment with diverse multimodal signals, including audio, vision, and text. Current …

Gem Citer Citeret af 3 Relaterede artikler Alle 2 versioner

[Free GPT-4]
[DeepSeek]

[PDF] techrxiv.org

Towards visual-prompt temporal answer grounding in instructional video

S Li, B Li, B Sun, Y Weng - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Temporal answer grounding in instructional video (TAGV) is a new task naturally derived
from temporal sentence grounding in general video (TSGV). Given an untrimmed …

Gem Citer Citeret af 7 Relaterede artikler Alle 8 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Constraint and union for partially-supervised temporal sentence grounding

C Ju, H Wang, J Liu, C Ma, Y Zhang, P Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org

Temporal sentence grounding aims to detect the event timestamps described by the natural
language query from given untrimmed videos. The existing fully-supervised setting achieves …

Gem Citer Citeret af 13 Relaterede artikler Alle 2 versioner Vis som HTML

MDCapsN: Multimodal, Multichannel, and Dual-Step Capsule Network for Natural Language Moment Localization

N Liu, X Sun, H Yu, F Yao, G Xu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Natural language moment localization aims to localize the target moment that matches a
given natural language query in an untrimmed video. The key to this challenging task is to …

Gem Citer Citeret af 9 Relaterede artikler Alle 3 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection

H Zhao, KQ Lin, R Yan, Z Li - arxiv preprint arxiv:2308.15109, 2023 - arxiv.org

Video moment retrieval and highlight detection have received attention in the current era of
video content proliferation, aiming to localize moments and estimate clip relevances based …

Gem Citer Citeret af 7 Relaterede artikler Alle 2 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Sdn: Semantic decoupling network for temporal language grounding

Dept: Decoupled prompt tuning

Embracing unimodal aleatoric uncertainty for robust multimodal fusion

Joint searching and grounding: Multi-granularity video content retrieval

Faster video moment retrieval with point-level supervision

Zero-shot video moment retrieval with angular reconstructive text embeddings

Joint objective and subjective fuzziness denoising for multimodal sentiment analysis

Towards visual-prompt temporal answer grounding in instructional video

Constraint and union for partially-supervised temporal sentence grounding

MDCapsN: Multimodal, Multichannel, and Dual-Step Capsule Network for Natural Language Moment Localization

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection