- Academic Search

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Uložit Citovat Počet citací tohoto článku: 391 Související články Všechny verze (počet: 12)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Zerocap: Zero-shot image-to-text generation for visual-semantic arithmetic

Y Tewel, Y Shalev, I Schwartz… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Recent text-to-image matching models apply contrastive learning to large corpora of
uncurated pairs of images and sentences. While such models can provide a powerful score …

Uložit Citovat Počet citací tohoto článku: 171 Související články Všechny verze (počet: 6) Zobrazit jako HTML

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

M Żelaszczyk, J Mańdziuk - Information Fusion, 2023 - Elsevier

We review the existing literature on generating text from visual data under the cross-modal
generation umbrella, which affords us to compare and contrast various approaches taking …

Uložit Citovat Počet citací tohoto článku: 18 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Language models can see: Plugging visual controls in text generation

Y Su, T Lan, Y Liu, F Liu, D Yogatama, Y Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

Generative language models (LMs) such as GPT-2/3 can be prompted to generate text with
remarkable quality. While they are designed for text-prompted generation, it remains an …

Uložit Citovat Počet citací tohoto článku: 101 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Using AI and social media multimodal content for disaster response and management: Opportunities, challenges, and future directions

M Imran, F Ofli, D Caragea, A Torralba - Information Processing & …, 2020 - Elsevier

Abstract People increasingly use Social Media (SM) platforms such as Twitter and Facebook
during disasters and emergencies to post situational updates including reports of injured or …

Uložit Citovat Počet citací tohoto článku: 199 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org

Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …

Uložit Citovat Počet citací tohoto článku: 161 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Context-aware visual policy network for fine-grained image captioning

ZJ Zha, D Liu, H Zhang, Y Zhang… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

With the maturity of visual detection techniques, we are more ambitious in describing visual
content with open-vocabulary, fine-grained and free-form language, ie, the task of image …

Uložit Citovat Počet citací tohoto článku: 158 Související články Všechny verze (počet: 9)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies

I Gat, I Schwartz, A Schwing… - Advances in Neural …, 2020 - proceedings.neurips.cc

Many recent datasets contain a variety of different data modalities, for instance, image,
question, and answer data in visual question answering (VQA). When training deep net …

Uložit Citovat Počet citací tohoto článku: 98 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Factor graph attention

I Schwartz, S Yu, T Hazan… - Proceedings of the …, 2019 - openaccess.thecvf.com

Dialog is an effective way to exchange information, but subtle details and nuances are
extremely important. While significant progress has paved a path to address visual dialog …

Uložit Citovat Počet citací tohoto článku: 131 Související články Všechny verze (počet: 14) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

[PDF][PDF] Zero-shot image-to-text generation for visual-semantic arithmetic

Y Tewel, Y Shalev, I Schwartz, L Wolf - arxiv preprint arxiv …, 2021 - academia.edu

Recent text-to-image matching models apply contrastive learning to large corpora of
uncurated pairs of images and sentences. While such models can provide a powerful score …

Uložit Citovat Počet citací tohoto článku: 51 Související články Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Diverse and coherent paragraph generation from images

From show to tell: A survey on deep learning-based image captioning

Zerocap: Zero-shot image-to-text generation for visual-semantic arithmetic

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

Language models can see: Plugging visual controls in text generation

Using AI and social media multimodal content for disaster response and management: Opportunities, challenges, and future directions

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

Context-aware visual policy network for fine-grained image captioning

Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies

Factor graph attention

[PDF][PDF] Zero-shot image-to-text generation for visual-semantic arithmetic