- Academic Search

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Salva Cita Citato da 111 Articoli correlati Tutte e 5 le versioni

Video pivoting unsupervised multi-modal machine translation

M Li, PY Huang, X Chang, J Hu, Y Yang… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

The main challenge in the field of unsupervised machine translation (UMT) is to associate
source-target sentences in the latent space. As people who speak different languages share …

Salva Cita Citato da 122 Articoli correlati Tutte e 7 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Support-set bottlenecks for video-text representation learning

M Patrick, PY Huang, Y Asano, F Metze… - arxiv preprint arxiv …, 2020 - arxiv.org

The dominant paradigm for learning video-text representations--noise contrastive learning--
increases the similarity of the representations of pairs of samples that are known to be …

Salva Cita Citato da 295 Articoli correlati Tutte e 9 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Experience grounds language

Y Bisk, A Holtzman, J Thomason, J Andreas… - arxiv preprint arxiv …, 2020 - arxiv.org

Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …

Salva Cita Citato da 425 Articoli correlati Tutte e 5 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis

W Han, H Chen, A Gelbukh, A Zadeh… - Proceedings of the …, 2021 - dl.acm.org

Multimodal sentiment analysis aims to extract and integrate semantic information collected
from multiple modalities to recognize the expressed emotions and sentiment in multimodal …

Salva Cita Citato da 221 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Deep vision multimodal learning: Methodology, benchmark, and trend

W Chai, G Wang - Applied Sciences, 2022 - mdpi.com

Deep vision multimodal learning aims at combining deep visual representation learning with
other modalities, such as text, sound, and data collected from other sensors. With the fast …

Salva Cita Citato da 33 Articoli correlati Tutte e 4 le versioni Copia cache

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scene graph as pivoting: Inference-time image-free unsupervised multimodal machine translation with visual scene hallucination

H Fei, Q Liu, M Zhang, M Zhang, TS Chua - arxiv preprint arxiv …, 2023 - arxiv.org

In this work, we investigate a more realistic unsupervised multimodal machine translation
(UMMT) setup, inference-time image-free UMMT, where the model is trained with source-text …

Salva Cita Citato da 81 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

IGLUE: A benchmark for transfer learning across modalities, tasks, and languages

E Bugliarello, F Liu, J Pfeiffer, S Reddy… - International …, 2022 - proceedings.mlr.press

Reliable evaluation benchmarks designed for replicability and comprehensiveness have
driven progress in machine learning. Due to the lack of a multilingual benchmark, however …

Salva Cita Citato da 62 Articoli correlati Tutte e 5 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross2StrA: Unpaired cross-lingual image captioning with cross-lingual cross-modal structure-pivoted alignment

S Wu, H Fei, W Ji, TS Chua - arxiv preprint arxiv:2305.12260, 2023 - arxiv.org

Unpaired cross-lingual image captioning has long suffered from irrelevancy and disfluency
issues, due to the inconsistencies of the semantic scene and syntax attributes during …

Salva Cita Citato da 63 Articoli correlati Tutte e 5 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Uc2: Universal cross-lingual cross-modal vision-and-language pre-training

M Zhou, L Zhou, S Wang, Y Cheng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Vision-and-language pre-training has achieved impressive success in learning multimodal
representations between vision and language. To generalize this success to non-English …

Salva Cita Citato da 90 Articoli correlati Tutte e 9 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Unsupervised multimodal neural machine translation with pseudo visual pivoting

Multimodal research in vision and language: A review of current and emerging trends

Video pivoting unsupervised multi-modal machine translation

Support-set bottlenecks for video-text representation learning

Experience grounds language

Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis

Deep vision multimodal learning: Methodology, benchmark, and trend

Scene graph as pivoting: Inference-time image-free unsupervised multimodal machine translation with visual scene hallucination

IGLUE: A benchmark for transfer learning across modalities, tasks, and languages

Cross2StrA: Unpaired cross-lingual image captioning with cross-lingual cross-modal structure-pivoted alignment

Uc2: Universal cross-lingual cross-modal vision-and-language pre-training