Μελετητής Google

M Cornia, L Baraldi, R Cucchiara - AI Communications, 2022 - content.iospress.com

Image Captioning is the task of translating an input image into a textual description. As such,
it connects Vision and Language in a generative fashion, with applications that range from …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 36 Σχετικά άρθρα Όλες οι 5 εκδοχές Full View

[Free GPT-4]

[PDF] acm.org

Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval

N Messina, M Stefanini, M Cornia, L Baraldi… - Proceedings of the 19th …, 2022 - dl.acm.org

Image-text matching is gaining a leading role among tasks involving the joint understanding
of vision and language. In literature, this task is often used as a pre-training objective to …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 32 Σχετικά άρθρα Όλες οι 7 εκδοχές

Deep residual weight-sharing attention network with low-rank attention for visual question answering

B Qin, H Hu, Y Zhuang - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org

The attention-based networks have become prevailing recently in visual question answering
(VQA) due to their high performances. However, the extensive memory consumption of …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 25 Σχετικά άρθρα Όλες οι 2 εκδοχές

[Free GPT-4]

[PDF] arxiv.org

LOIS: looking out of instance semantics for visual question answering

S Zhang, Y Chen, Y Sun, F Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Visual question answering (VQA) has been intensively studied as a multimodal task,
requiring efforts to bridge vision and language for correct answer inference. Recent attempts …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 4 Σχετικά άρθρα Όλες οι 5 εκδοχές

Why are you traveling? Inferring trip profiles from online reviews and domain-knowledge

LGS Félix, W Cunha, CMV de Andrade… - Online Social Networks …, 2025 - Elsevier

This paper addresses the task of inferring trip profiles (TPs), which consists of determining
the profile of travelers engaged in a particular trip given a set of possible categories. TPs …

Αποθήκευση Παράθεση Σχετικά άρθρα

[Free GPT-4]

[PDF] arxiv.org

Learning to select: A fully attentive approach for novel object captioning

M Cagrandi, M Cornia, M Stefanini, L Baraldi… - Proceedings of the …, 2021 - dl.acm.org

Image captioning models have lately shown impressive results when applied to standard
datasets. Switching to real-life scenarios, however, constitutes a challenge due to the larger …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 9 Σχετικά άρθρα Όλες οι 7 εκδοχές

[Free GPT-4]

[PDF] arxiv.org

Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline

N Messina, L Vadicamo, L Maltese… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in deep learning have significantly enhanced content-based retrieval
methods, notably through models like CLIP that map images and texts into a shared …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

Is CLIP the main roadblock for fine-grained open-world perception?

L Bianchi, F Carrara, N Messina, F Falchi - arxiv preprint arxiv …, 2024 - arxiv.org

Modern applications increasingly demand flexible computer vision models that adapt to
novel concepts not encountered during training. This necessity is pivotal in emerging …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 2 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] unimore.it

Trasformare Visione e Linguaggio con Attenzione

M Stefanini - 2023 - iris.unimore.it

Attention mechanism and Transformer-based architectures have recently revolutionized the
artificial intelligence landscape in almost every field. Ever since their first introduction, they …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

用于图文检索的跨模态信息交互推理网络.

魏钰琦， **宁 - Journal of Computer Engineering & …, 2023 - search.ebscohost.com

针对跨模态检索任务中图像与文本模态的语义特征复杂度不一致问题, 提出了一种局部细粒度
对齐与全局特征推理相结合的图文匹配方法. 首先将图像和文本特征输入自适应交叉注意网络 …

Αποθήκευση Παράθεση Σχετικά άρθρα

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

A novel attention-based aggregation function to combine vision and language

Explaining transformer-based image captioning models: An empirical analysis

Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval

Deep residual weight-sharing attention network with low-rank attention for visual question answering

LOIS: looking out of instance semantics for visual question answering

Why are you traveling? Inferring trip profiles from online reviews and domain-knowledge

Learning to select: A fully attentive approach for novel object captioning

Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline

Is CLIP the main roadblock for fine-grained open-world perception?

Trasformare Visione e Linguaggio con Attenzione

用于图文检索的跨模态信息交互推理网络.