Μελετητής Google

EN Crothers, N Japkowicz, HL Viktor - IEEE Access, 2023 - ieeexplore.ieee.org

Machine-generated text is increasingly difficult to distinguish from text authored by humans.
Powerful open-source models are freely available, and user-friendly tools that democratize …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 178 Σχετικά άρθρα Όλες οι 5 εκδοχές

[Free GPT-4]

[PDF] thecvf.com

Align and attend: Multimodal summarization with dual contrastive losses

B He, J Wang, J Qiu, T Bui… - Proceedings of the …, 2023 - openaccess.thecvf.com

The goal of multimodal summarization is to extract the most important information from
different modalities to form summaries. Unlike unimodal summarization, the multimodal …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 60 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] aaai.org

Docformerv2: Local features for document understanding

S Appalaraju, P Tang, Q Dong, N Sankaran… - Proceedings of the …, 2024 - ojs.aaai.org

We propose DocFormerv2, a multi-modal transformer for Visual Document Understanding
(VDU). The VDU domain entails understanding documents (beyond mere OCR predictions) …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 40 Σχετικά άρθρα Όλες οι 4 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] thecvf.com

Learning attention propagation for compositional zero-shot learning

MGZA Khan, MF Naeem, L Van Gool… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compositional zero-shot learning aims to recognize unseen compositions of seen visual
primitives of object classes and their states. While all primitives (states and objects) are …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 22 Σχετικά άρθρα Όλες οι 8 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] acm.org

Separate and locate: Rethink the text in text-based visual question answering

C Fang, J Li, L Li, C Ma, D Hu - … of the 31st ACM International Conference …, 2023 - dl.acm.org

Text-based Visual Question Answering (TextVQA) aims at answering questions about the
text in images. Most works in this field focus on designing network structures or pre-training …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 13 Σχετικά άρθρα Όλες οι 3 εκδοχές

[Free GPT-4]

[PDF] thecvf.com

Prestu: Pre-training for scene-text understanding

J Kil, S Changpinyo, X Chen, H Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

The ability to recognize and reason about text embedded in visual inputs is often lacking in
vision-and-language (V&L) models, perhaps because V&L pre-training methods have often …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 23 Σχετικά άρθρα Όλες οι 9 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] acm.org

Filling in the blank: Rationale-augmented prompt tuning for TextVQA

G Zeng, Y Zhang, Y Zhou, B Fang, G Zhao… - Proceedings of the 31st …, 2023 - dl.acm.org

Recently, generative Text-based visual question answering (TextVQA) methods, which are
often based on language models, have exhibited impressive results and drawn increasing …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 8 Σχετικά άρθρα

[Free GPT-4]

[PDF] arxiv.org

Toward 3d spatial reasoning for human-like text-based visual question answering

H Li, J Huang, P **, G Song, Q Wu, J Chen - arxiv preprint arxiv …, 2022 - arxiv.org

Text-based Visual Question Answering~(TextVQA) aims to produce correct answers for
given questions about the images with multiple scene texts. In most cases, the texts naturally …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 14 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

Prophet: Prompting large language models with complementary answer heuristics for knowledge-based visual question answering

Z Yu, X Ouyang, Z Shao, M Wang, J Yu - arxiv preprint arxiv:2303.01903, 2023 - arxiv.org

Knowledge-based visual question answering (VQA) requires external knowledge beyond
the image to answer the question. Early studies retrieve required knowledge from explicit …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 12 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

Asymmetric cross-modal attention network with multimodal augmented mixup for medical visual question answering

Y Li, Q Yang, FL Wang, LK Lee, Y Qu, T Hao - Artificial Intelligence in …, 2023 - Elsevier

Insufficient training data is a common barrier to effectively learn multimodal information
interactions and question semantics in existing medical Visual Question Answering (VQA) …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 4 Σχετικά άρθρα Όλες οι 4 εκδοχές

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Tag: Boosting text-vqa via text-aware visual question-answer generation

Machine-generated text: A comprehensive survey of threat models and detection methods

Align and attend: Multimodal summarization with dual contrastive losses

Docformerv2: Local features for document understanding

Learning attention propagation for compositional zero-shot learning

Separate and locate: Rethink the text in text-based visual question answering

Prestu: Pre-training for scene-text understanding

Filling in the blank: Rationale-augmented prompt tuning for TextVQA

Toward 3d spatial reasoning for human-like text-based visual question answering

Prophet: Prompting large language models with complementary answer heuristics for knowledge-based visual question answering

Asymmetric cross-modal attention network with multimodal augmented mixup for medical visual question answering