- Academic Search

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Spara Citera Citerat av 395 Relaterade artiklar Alla 11 versionerna

[Free GPT-4]

[PDF] arxiv.org

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org

Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Spara Citera Citerat av 1008 Relaterade artiklar Alla 8 versionerna

[Free GPT-4]

[PDF] thecvf.com

Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation

X Wang, Q Huang, A Celikyilmaz… - Proceedings of the …, 2019 - openaccess.thecvf.com

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out
natural language instructions inside real 3D environments. In this paper, we study how to …

Spara Citera Citerat av 616 Relaterade artiklar Alla 10 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org

Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Spara Citera Citerat av 434 Relaterade artiklar Alla 3 versionerna

[Free GPT-4]

[PDF] thecvf.com

Neural motifs: Scene graph parsing with global context

R Zellers, M Yatskar, S Thomson… - Proceedings of the …, 2018 - openaccess.thecvf.com

We investigate the problem of producing structured graph representations of visual scenes.
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …

Spara Citera Citerat av 1168 Relaterade artiklar Alla 9 versionerna Se som HTML-version

[Free GPT-4]

[PDF] thecvf.com

Making the v in vqa matter: Elevating the role of image understanding in visual question answering

Y Goyal, T Khot, D Summers-Stay… - Proceedings of the …, 2017 - openaccess.thecvf.com

Problems at the intersection of vision and language are of significant importance both as
challenging research questions and for the rich set of applications they enable. However …

Spara Citera Citerat av 3436 Relaterade artiklar Alla 15 versionerna Se som HTML-version

[Free GPT-4]

[PDF] koreamed.org

Deep learning in medical imaging: general overview

JG Lee, S Jun, YW Cho, H Lee… - Korean journal of …, 2017 - synapse.koreamed.org

The artificial neural network (ANN)–a machine learning technique inspired by the human
neuronal synapse system–was introduced in the 1950s. However, the ANN was previously …

Spara Citera Citerat av 1487 Relaterade artiklar Alla 15 versionerna Se som HTML-version

[Free GPT-4]

[PDF] thecvf.com

Knowing when to look: Adaptive attention via a visual sentinel for image captioning

J Lu, C **ong, D Parikh… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

Attention-based neural encoder-decoder frameworks have been widely adopted for image
captioning. Most methods force visual attention to be active for every generated word …

Spara Citera Citerat av 1932 Relaterade artiklar Alla 9 versionerna Se som HTML-version

[Free GPT-4]

[PDF] thecvf.com

Neural baby talk

J Lu, J Yang, D Batra, D Parikh - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com

We introduce a novel framework for image captioning that can produce natural language
explicitly grounded in entities that object detectors find in the image. Our approach …

Spara Citera Citerat av 589 Relaterade artiklar Alla 9 versionerna Se som HTML-version

[Free GPT-4]

[PDF] thecvf.com

Online multi-object tracking with dual matching attention networks

J Zhu, H Yang, N Liu, M Kim… - Proceedings of the …, 2018 - openaccess.thecvf.com

In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates
the merits of single object tracking and data association methods in a unified framework to …

Spara Citera Citerat av 444 Relaterade artiklar Alla 14 versionerna Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Mind's eye: A recurrent visual representation for image caption generation

From show to tell: A survey on deep learning-based image captioning

A comprehensive survey of deep learning for image captioning

Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation

Multimodal intelligence: Representation learning, information fusion, and applications

Neural motifs: Scene graph parsing with global context

Making the v in vqa matter: Elevating the role of image understanding in visual question answering

Deep learning in medical imaging: general overview

Knowing when to look: Adaptive attention via a visual sentinel for image captioning

Neural baby talk

Online multi-object tracking with dual matching attention networks