Dept: Decoupled prompt tuning

J Zhang, S Wu, L Gao, HT Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com
This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning ie the
better the tuned model generalizes to the base (or target) task the worse it generalizes to …

Embracing unimodal aleatoric uncertainty for robust multimodal fusion

Z Gao, X Jiang, X Xu, F Shen, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
As a fundamental problem in multimodal learning multimodal fusion aims to compensate for
the inherent limitations of a single modality. One challenge of multimodal fusion is that the …

Joint searching and grounding: Multi-granularity video content retrieval

Z Chen, X Jiang, X Xu, Z Cao, Y Mo… - Proceedings of the 31st …, 2023 - dl.acm.org
Text-based video retrieval is a well-studied task aimed at retrieving relevant videos from a
large collection in response to a given text query. Most existing TVR works assume that …

Faster video moment retrieval with point-level supervision

X Jiang, Z Zhou, X Xu, Y Yang, G Wang… - Proceedings of the 31st …, 2023 - dl.acm.org
Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an
untrimmed video with natural language queries. Existing VMR methods suffer from two …

Zero-shot video moment retrieval with angular reconstructive text embeddings

X Jiang, X Xu, Z Zhou, Y Yang, F Shen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at
retrieving a specific moment where the video content is semantically related to the text …

Joint objective and subjective fuzziness denoising for multimodal sentiment analysis

X Jiang, X Xu, H Lu, L He… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multimodal Sentiment Analysis (MSA) aims at teaching computers or robotics to understand
human sentiment with diverse multimodal signals, including audio, vision, and text. Current …

Towards visual-prompt temporal answer grounding in instructional video

S Li, B Li, B Sun, Y Weng - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Temporal answer grounding in instructional video (TAGV) is a new task naturally derived
from temporal sentence grounding in general video (TSGV). Given an untrimmed …

Constraint and union for partially-supervised temporal sentence grounding

C Ju, H Wang, J Liu, C Ma, Y Zhang, P Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org
Temporal sentence grounding aims to detect the event timestamps described by the natural
language query from given untrimmed videos. The existing fully-supervised setting achieves …

MDCapsN: Multimodal, Multichannel, and Dual-Step Capsule Network for Natural Language Moment Localization

N Liu, X Sun, H Yu, F Yao, G Xu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Natural language moment localization aims to localize the target moment that matches a
given natural language query in an untrimmed video. The key to this challenging task is to …

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection

H Zhao, KQ Lin, R Yan, Z Li - arxiv preprint arxiv:2308.15109, 2023 - arxiv.org
Video moment retrieval and highlight detection have received attention in the current era of
video content proliferation, aiming to localize moments and estimate clip relevances based …