- Academic Search

W Guo, J Wang, S Wang - Ieee Access, 2019‏ - ieeexplore.ieee.org‏

Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …‏

שמור צטט צוטט על ידי 545 מאמרים בנושא זה כל 5 הגרסאות

Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval‏

X Xu, H Lu, J Song, Y Yang… - IEEE transactions on …, 2019‏ - ieeexplore.ieee.org‏

Given a query instance from one modality (eg, image), cross-modal retrieval aims to find
semantically similar instances from another modality (eg, text). To perform cross-modal …‏

שמור צטט צוטט על ידי 220 מאמרים בנושא זה כל 4 הגרסאות

Graph embedding contrastive multi-modal representation learning for clustering‏

W **a, T Wang, Q Gao, M Yang… - IEEE Transactions on …, 2023‏ - ieeexplore.ieee.org‏

Multi-modal clustering (MMC) aims to explore complementary information from diverse
modalities for clustering performance facilitating. This article studies challenging problems in …‏

שמור צטט צוטט על ידי 49 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Dual alignment unsupervised domain adaptation for video-text retrieval‏

X Hao, W Zhang, D Wu, F Zhu… - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com‏

Video-text retrieval is an emerging stream in both computer vision and natural language
processing communities, which aims to find relevant videos given text queries. In this paper …‏

שמור צטט צוטט על ידי 22 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MHTN: Modal-adversarial hybrid transfer network for cross-modal retrieval‏

X Huang, Y Peng, M Yuan - IEEE transactions on cybernetics, 2018‏ - ieeexplore.ieee.org‏

Cross-modal retrieval has drawn wide interest for retrieval across different modalities (such
as text, image, video, audio, and 3-D model). However, existing methods based on a deep …‏

שמור צטט צוטט על ידי 140 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Unsupervised domain adaptative temporal sentence localization with mutual information maximization‏

D Liu, X Fang, X Qu, J Dong, H Yan, Y Yang… - Proceedings of the …, 2024‏ - ojs.aaai.org‏

Temporal sentence localization (TSL) aims to localize a target segment in a video according
to a given sentence query. Though respectable works have made decent achievements in …‏

שמור צטט צוטט על ידי 6 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multi-modality associative bridging through memory: Speech sound recollected from face video‏

M Kim, J Hong, SJ Park, YM Ro - Proceedings of the IEEE …, 2021‏ - openaccess.thecvf.com‏

In this paper, we introduce a novel audio-visual multi-modal bridging framework that can
utilize both audio and visual information, even with uni-modal inputs. We exploit a memory …‏

שמור צטט צוטט על ידי 52 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited‏

X Xu, K Lin, Y Yang, A Hanjalic… - IEEE Transactions on …, 2020‏ - ieeexplore.ieee.org‏

Recently, generative adversarial network (GAN) has shown its strong ability on modeling
data distribution via adversarial learning. Cross-modal GAN, which attempts to utilize the …‏

שמור צטט צוטט על ידי 72 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model‏

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org‏

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …‏

שמור צטט צוטט על ידי 18 מאמרים בנושא זה כל 7 הגרסאות

Learning cross-modal common representations by private–shared subspaces separation‏

X Xu, K Lin, L Gao, H Lu, HT Shen… - IEEE Transactions on …, 2020‏ - ieeexplore.ieee.org‏

Due to the inconsistent distributions and representations of different modalities (eg, images
and texts), it is very challenging to correlate such heterogeneous data. A standard solution is …‏

שמור צטט צוטט על ידי 55 מאמרים בנושא זה כל 4 הגרסאות

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Deep cross-media knowledge transfer

Deep multimodal representation learning: A survey‏

Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval‏

Graph embedding contrastive multi-modal representation learning for clustering‏

Dual alignment unsupervised domain adaptation for video-text retrieval‏

MHTN: Modal-adversarial hybrid transfer network for cross-modal retrieval‏

Unsupervised domain adaptative temporal sentence localization with mutual information maximization‏

Multi-modality associative bridging through memory: Speech sound recollected from face video‏

Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited‏

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model‏

Learning cross-modal common representations by private–shared subspaces separation‏