Google 학술 검색

T Wang, F Li, L Zhu, J Li, Z Zhang… - Proceedings of the …, 2025 - ieeexplore.ieee.org

With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …

저장 인용 24회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] thecvf.com

Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation

X Dong, T Gan, X Song, J Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Scene Graph Generation, which generally follows a regular encoder-decoder
pipeline, aims to first encode the visual contents within the given image and then parse them …

저장 인용 113회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Token shift transformer for video classification

H Zhang, Y Hao, CW Ngo - Proceedings of the 29th ACM International …, 2021 - dl.acm.org

Transformer achieves remarkable successes in understanding 1 and 2-dimensional signals
(eg, NLP and Image Content Understanding). As a potential alternative to convolutional …

저장 인용 123회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]

[PDF] thecvf.com

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

저장 인용 14회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] jiaofangkai.com

Personalized fashion compatibility modeling via metapath-guided heterogeneous graph learning

W Guan, F Jiao, X Song, H Wen, CH Yeh… - Proceedings of the 45th …, 2022 - dl.acm.org

Fashion Compatibility Modeling (FCM) is a new yet challenging task, which aims to
automatically access the matching degree among a set of complementary items. Most of …

저장 인용 52회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]

[PDF] arxiv.org

Reading-strategy inspired visual representation learning for text-to-video retrieval

J Dong, Y Wang, X Chen, X Qu, X Li… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …

저장 인용 66회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] arxiv.org

Partially relevant video retrieval

J Dong, X Chen, M Zhang, X Yang, S Chen… - Proceedings of the 30th …, 2022 - dl.acm.org

Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …

저장 인용 49회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] researchgate.net

Scene graph refinement network for visual question answering

T Qian, J Chen, S Chen, B Wu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Visual Question Answering aims to answer the free-form natural language question based
on the visual clues in a given image. It is a difficult problem as it requires understanding the …

저장 인용 43회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]

[PDF] arxiv.org

Hierarchical local-global transformer for temporal sentence grounding

X Fang, D Liu, P Zhou, Z Xu, R Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

This article studies the multimedia problem of temporal sentence grounding (TSG), which
aims to accurately determine the specific video segment in an untrimmed video according to …

저장 인용 40회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] arxiv.org

More: Multi-order relation mining for dense captioning in 3d scenes

Y Jiao, S Chen, Z Jie, J Chen, L Ma… - European Conference on …, 2022 - Springer

Abstract 3D dense captioning is a recently-proposed novel task, where point clouds contain
more geometric information than the 2D counterpart. However, it is also more challenging …

저장 인용 45회 인용 관련 학술자료 전체 5개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Spatial-temporal graphs for cross-modal text2video retrieval

Cross-modal retrieval: a systematic review of methods and future directions

Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation

Token shift transformer for video classification

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

Personalized fashion compatibility modeling via metapath-guided heterogeneous graph learning

Reading-strategy inspired visual representation learning for text-to-video retrieval

Partially relevant video retrieval

Scene graph refinement network for visual question answering

Hierarchical local-global transformer for temporal sentence grounding

More: Multi-order relation mining for dense captioning in 3d scenes