- Academic Search

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org

Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Enregistrer Citer Cité 1008 fois Autres articles Les 8 versions Free GPT-4

Deep learning for image-to-text generation: A technical overview

X He, L Deng - IEEE Signal Processing Magazine, 2017 - ieeexplore.ieee.org

Generating a natural language description from an image is an emerging interdisciplinary
problem at the intersection of computer vision, natural language processing, and artificial …

Enregistrer Citer Cité 140 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Gqa: A new dataset for real-world visual reasoning and compositional question answering

DA Hudson, CD Manning - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com

We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …

Enregistrer Citer Cité 1982 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Attention gated networks: Learning to leverage salient regions in medical images

J Schlemper, O Oktay, M Schaap, M Heinrich… - Medical image …, 2019 - Elsevier

We propose a novel attention gate (AG) model for medical image analysis that automatically
learns to focus on target structures of varying shapes and sizes. Models trained with AGs …

Enregistrer Citer Cité 1727 fois Autres articles Les 13 versions Free GPT-4

[Free GPT-4]

[PDF] theobjects.com

Attention u-net: Learning where to look for the pancreas

O Oktay, J Schlemper, LL Folgoc, M Lee… - arxiv preprint arxiv …, 2018 - arxiv.org

We propose a novel attention gate (AG) model for medical imaging that automatically learns
to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly …

Enregistrer Citer Cité 7250 fois Autres articles Les 9 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Bottom-up abstractive summarization

S Gehrmann, Y Deng, AM Rush - arxiv preprint arxiv:1808.10792, 2018 - arxiv.org

Neural network-based methods for abstractive summarization produce outputs that are more
fluent than other techniques, but which can be poor at content selection. This work proposes …

Enregistrer Citer Cité 864 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] aaai.org

Film: Visual reasoning with a general conditioning layer

E Perez, F Strub, H De Vries, V Dumoulin… - Proceedings of the …, 2018 - ojs.aaai.org

We introduce a general-purpose conditioning method for neural networks called FiLM:
Feature-wise Linear Modulation. FiLM layers influence neural network computation via a …

Enregistrer Citer Cité 2322 fois Autres articles Les 16 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Camp: Cross-modal adaptive message passing for text-image retrieval

Z Wang, X Liu, H Li, L Sheng, J Yan… - Proceedings of the …, 2019 - openaccess.thecvf.com

Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Most previous approaches independently embed images and sentences into a joint …

Enregistrer Citer Cité 377 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Learning by abstraction: The neural state machine

D Hudson, CD Manning - Advances in neural information …, 2019 - proceedings.neurips.cc

Abstract We introduce the Neural State Machine, seeking to bridge the gap between the
neural and symbolic views of AI and integrate their complementary strengths for the task of …

Enregistrer Citer Cité 326 fois Autres articles Les 10 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Reclip: A strong zero-shot baseline for referring expression comprehension

S Subramanian, W Merrill, T Darrell, M Gardner… - arxiv preprint arxiv …, 2022 - arxiv.org

Training a referring expression comprehension (ReC) model for a new visual domain
requires collecting referring expressions, and potentially corresponding bounding boxes, for …

Enregistrer Citer Cité 121 fois Autres articles Les 5 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Bottom-up and top-down attention for image captioning and vqa

A comprehensive survey of deep learning for image captioning

Deep learning for image-to-text generation: A technical overview

Gqa: A new dataset for real-world visual reasoning and compositional question answering

[HTML][HTML] Attention gated networks: Learning to leverage salient regions in medical images

Attention u-net: Learning where to look for the pancreas

Bottom-up abstractive summarization

Film: Visual reasoning with a general conditioning layer

Camp: Cross-modal adaptive message passing for text-image retrieval

Learning by abstraction: The neural state machine

Reclip: A strong zero-shot baseline for referring expression comprehension