- Academic Search

F Chen, M Han, H Zhao, Q Zhang, J Shi, S Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable language abilities. GPT-4,
based on advanced LLMs, exhibits extraordinary multimodal capabilities beyond previous …

保存引用被引用数: 111 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Towards top-down reasoning: An explainable multi-agent approach for visual question answering

Z Wang, W Wan, Q Lao, R Chen, M Lang… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, several methods have been proposed to augment large Vision Language Models
(VLMs) for Visual Question Answering (VQA) simplicity by incorporating external knowledge …

保存引用被引用数: 8 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] aclanthology.org

Visdiahalbench: A visual dialogue benchmark for diagnosing hallucination in large vision-language models

Q Cao, J Cheng, X Liang, L Lin - … of the 62nd Annual Meeting of …, 2024 - aclanthology.org

Despite the significant success of large vision-language models (LVLMs), some studies
have revealed that LVLMs suffer from the hallucination problem, where the LVLMs' response …

保存引用被引用数: 3 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Zrigf: An innovative multimodal framework for zero-resource image-grounded dialogue generation

B Zhang, J Wang, H Ma, B Xu, H Lin - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Image-grounded dialogue systems benefit greatly from integrating visual information,
resulting in high-quality response generation. However, current models struggle to …

保存引用被引用数: 4 関連記事全 4 バージョン

[Free GPT-4]

[PDF] thecvf.com

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

S Han, J Hessel, N Dziri, Y Choi… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Visual information is central to conversation: body gestures and physical behaviour, for
example, contribute to meaning that transcends words alone. To date, however, most neural …

保存引用被引用数: 8 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] aaai.org

Structure-Aware Multimodal Sequential Learning for Visual Dialog

YJ Kim, MJ Kim, K An, J Ahn, J Kim, YJ Heo… - Proceedings of the …, 2024 - ojs.aaai.org

With the ability to collect vast amounts of image and natural language data from the web,
there has been a remarkable advancement in Large-scale Language Models (LLMs). This …

保存引用被引用数: 1 関連記事 HTMLバージョン

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] A fine-grained deconfounding study for knowledge-based visual dialog

AA Liu, Q Wu, C Huang, C Xue, X Liu, N Xu - Visual Informatics, 2024 - Elsevier

Abstract Knowledge-based Visual Dialog is a challenging vision-language task, where an
agent engages in dialog to answer questions with humans based on the input image and …

保存引用関連記事

[Free GPT-4]

[PDF] arxiv.org

FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval

J **e, J Kuang, Z Lin, J Ouyang, Z Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Given a query from one modality, few-shot cross-modal retrieval (CMR) retrieves
semantically similar instances in another modality with the target domain including classes …

保存引用関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] acm.org

Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data

N Pahari, K Shimada - ACM Transactions on Asian and Low-Resource …, 2024 - dl.acm.org

Code-switching entails mixing multiple languages. It is an increasingly occurring
phenomenon in social media texts. Usually, code-mixed texts are written in a single script …

保存引用関連記事全 3 バージョン

Multi-round dialogue state tracking by object-entity alignment in visual dialog

W Pang - CAAI International Conference on Artificial Intelligence, 2023 - Springer

Visual Dialog (VD) is a task where an agent answers a series of image-related questions
based on a multi-round dialog history. However, previous VD methods often treat the entire …

保存引用被引用数: 2 関連記事全 2 バージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Unsupervised and pseudo-supervised vision-language alignment in visual dialog

X-llm: Bootstrap** advanced large language models by treating multi-modalities as foreign languages

Towards top-down reasoning: An explainable multi-agent approach for visual question answering

Visdiahalbench: A visual dialogue benchmark for diagnosing hallucination in large vision-language models

Zrigf: An innovative multimodal framework for zero-resource image-grounded dialogue generation

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Structure-Aware Multimodal Sequential Learning for Visual Dialog

[HTML][HTML] A fine-grained deconfounding study for knowledge-based visual dialog

FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval

Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data

Multi-round dialogue state tracking by object-entity alignment in visual dialog