Google Akademik

V Iashin, E Rahtu - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com

Dense video captioning is a task of localizing interesting events from an untrimmed video
and producing textual description (captions) for each localized event. Most of the previous …

Kaydet Alıntı yap Alıntılanma sayısı: 220 İlgili makaleler 9 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A better use of audio-visual cues: Dense video captioning with bi-modal transformer

V Iashin, E Rahtu - arxiv preprint arxiv:2005.08271, 2020 - arxiv.org

Dense video captioning aims to localize and describe important events in untrimmed videos.
Existing methods mainly tackle this task by exploiting only visual features, while completely …

Kaydet Alıntı yap Alıntılanma sayısı: 170 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Watch, listen and tell: Multi-modal weakly supervised dense event captioning

T Rahman, B Xu, L Sigal - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

Multi-modal learning, particularly among imaging and linguistic modalities, has made
amazing strides in many high-level fundamental visual understanding problems, ranging …

Kaydet Alıntı yap Alıntılanma sayısı: 113 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Temporal deformable convolutional encoder-decoder networks for video captioning

J Chen, Y Pan, Y Li, T Yao, H Chao, T Mei - Proceedings of the AAAI …, 2019 - ojs.aaai.org

It is well believed that video captioning is a fundamental but challenging task in both
computer vision and artificial intelligence fields. The prevalent approach is to map an input …

Kaydet Alıntı yap Alıntılanma sayısı: 120 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Language model agnostic gray-box adversarial attack on image captioning

N Aafaq, N Akhtar, W Liu, M Shah… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Adversarial susceptibility of neural image captioning is still under-explored due to the
complex multi-model nature of the task. We introduce a GAN-based adversarial attack to …

Kaydet Alıntı yap Alıntılanma sayısı: 21 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

TAVT: Towards Transferable Audio-Visual Text Generation

W Lin, T **, W Pan, L Li, X Cheng… - Proceedings of the …, 2023 - aclanthology.org

Audio-visual text generation aims to understand multi-modality contents and translate them
into texts. Although various transfer learning techniques of text generation have been …

Kaydet Alıntı yap Alıntılanma sayısı: 12 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Semantic similarity on multimodal data: A comprehensive survey with applications

B Ihnaini, B Abuhaija, EA Mills… - Journal of King Saud …, 2024 - Elsevier

Recently, the revival of the semantic similarity concept has been featured by the rapidly
growing artificial intelligence research fueled by advanced deep learning architectures …

Kaydet Alıntı yap Alıntılanma sayısı: 1 İlgili makaleler

Dense video captioning with early linguistic information fusion

N Aafaq, A Mian, N Akhtar, W Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Dense captioning methods generally detect events in videos first and then generate
captions for the individual events. Events are localized solely based on the visual cues while …

Kaydet Alıntı yap Alıntılanma sayısı: 31 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] bjtu.edu.cn

Deep reinforcement polishing network for video captioning

W Xu, J Yu, Z Miao, L Wan, Y Tian… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

The video captioning task aims to describe video content using several natural-language
sentences. Although one-step encoder-decoder models have achieved promising progress …

Kaydet Alıntı yap Alıntılanma sayısı: 51 İlgili makaleler 2 sürümün hepsi

I²Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning

Y Tu, L Li, L Su, S Gao, C Yan, ZJ Zha… - … on Image Processing, 2022 - ieeexplore.ieee.org

TV show captioning aims to generate a linguistic sentence based on the video and its
associated subtitle. Compared to purely video-based captioning, the subtitle can provide the …

Kaydet Alıntı yap Alıntılanma sayısı: 25 İlgili makaleler 4 sürümün hepsi

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Integrating both visual and audio cues for enhanced video caption

Multi-modal dense video captioning

A better use of audio-visual cues: Dense video captioning with bi-modal transformer

Watch, listen and tell: Multi-modal weakly supervised dense event captioning

Temporal deformable convolutional encoder-decoder networks for video captioning

Language model agnostic gray-box adversarial attack on image captioning

TAVT: Towards Transferable Audio-Visual Text Generation

[HTML][HTML] Semantic similarity on multimodal data: A comprehensive survey with applications

Dense video captioning with early linguistic information fusion

Deep reinforcement polishing network for video captioning

I²Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning