Google Наука

Запазване Позоваване С позовавания в 215 Сродни статии Всички 5 версии

[PDF] hal.science

Deep multimodal fusion for semantic image segmentation: A survey

Y Zhang, D Sidibé, O Morel, F Mériaudeau - Image and Vision Computing, 2021 - Elsevier

Recent advances in deep learning have shown excellent performance in various scene
understanding tasks. However, in some complex environments or under challenging …

Запазване Позоваване С позовавания в 295 Сродни статии Всички 7 версии Във вид на HTML

A survey on multimodal large language models for autonomous driving

C Cui, Y Ma, X Cao, W Ye, Y Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the emergence of Large Language Models (LLMs) and Vision Foundation Models
(VFMs), multimodal AI systems benefiting from large models have the potential to equally …

Запазване Позоваване С позовавания в 959 Сродни статии Всички 16 версии

Deep audio-visual speech recognition

T Afouras, JS Chung, A Senior… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …

Запазване Позоваване С позовавания в 956 Сродни статии Всички 6 версии Във вид на HTML

Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation

A Ephrat, I Mosseri, O Lang, T Dekel, K Wilson… - arxiv preprint arxiv …, 2018 - arxiv.org

We present a joint audio-visual model for isolating a single speech signal from a mixture of
sounds such as other speakers and background noise. Solving this task using only audio as …

Запазване Позоваване С позовавания в 71 Сродни статии Всички 11 версии Във вид на HTML

Pmr: Prototypical modal rebalance for multimodal learning

Y Fan, W Xu, H Wang, J Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Multimodal learning (MML) aims to jointly exploit the common priors of different modalities to
compensate for their inherent limitations. However, existing MML methods often optimize a …

Запазване Позоваване С позовавания в 538 Сродни статии Всички 11 версии Във вид на HTML

Audio-visual event localization in unconstrained videos

Y Tian, J Shi, B Li, Z Duan, C Xu - Proceedings of the …, 2018 - openaccess.thecvf.com

In this paper, we introduce a novel problem of audio-visual event localization in
unconstrained videos. We define an audio-visual event as an event that is both visible and …

Запазване Позоваване С позовавания в 1018 Сродни статии Всички 20 версии Във вид на HTML

Lip reading sentences in the wild

J Son Chung, A Senior, O Vinyals… - Proceedings of the …, 2017 - openaccess.thecvf.com

The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …

Запазване Позоваване С позовавания в 147 Сродни статии Всички 9 версии Във вид на HTML

A survey on multimodal disinformation detection

F Alam, S Cresci, T Chakraborty, F Silvestri… - arxiv preprint arxiv …, 2021 - arxiv.org

Recent years have witnessed the proliferation of offensive content online such as fake news,
propaganda, misinformation, and disinformation. While initially this was mostly about textual …