- Academic Search

FL Chen, DZ Zhang, ML Han, XY Chen, J Shi… - Machine Intelligence …, 2023 - Springer

In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …

Save Cite Cited by 217 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

X-llm: Bootstrap** advanced large language models by treating multi-modalities as foreign languages

F Chen, M Han, H Zhao, Q Zhang, J Shi, S Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable language abilities. GPT-4,
based on advanced LLMs, exhibits extraordinary multimodal capabilities beyond previous …

Save Cite Cited by 111 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

GoG: Relation-aware graph-over-graph network for visual dialog

F Chen, X Chen, F Meng, P Li, J Zhou - arxiv preprint arxiv:2109.08475, 2021 - arxiv.org

Visual dialog, which aims to hold a meaningful conversation with humans about a given
image, is a challenging task that requires models to reason the complex dependencies …

Save Cite Cited by 34 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Improving cross-modal understanding in visual dialog via contrastive learning

F Chen, X Chen, S Xu, B Xu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Visual Dialog is a challenging vision-language task since the visual dialog agent needs to
answer a series of questions after reasoning over both the image content and dialog history …

Save Cite Cited by 24 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

The dialog must go on: Improving visual dialog via generative self-training

GC Kang, S Kim, JH Kim, D Kwak… - Proceedings of the …, 2023 - openaccess.thecvf.com

Visual dialog (VisDial) is a task of answering a sequence of questions grounded in an
image, using the dialog history as context. Prior work has trained the dialog agents solely on …

Save Cite Cited by 16 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

KBGN: Knowledge-bridge graph network for adaptive vision-text reasoning in visual dialogue

X Jiang, S Du, Z Qin, Y Sun, J Yu - Proceedings of the 28th ACM …, 2020 - dl.acm.org

Visual dialogue is a challenging task that needs to extract implicit information from both
visual (image) and textual (dialogue history) contexts. Classical approaches pay more …

Save Cite Cited by 39 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Reasoning with multi-structure commonsense knowledge in visual dialog

S Zhang, X Jiang, Z Yang, T Wan… - Proceedings of the …, 2022 - openaccess.thecvf.com

Visual Dialog requires an agent to engage in a conversation with humans grounded in an
image. Many studies on Visual Dialog focus on the understanding of the dialog history or the …

Save Cite Cited by 15 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Unsupervised and pseudo-supervised vision-language alignment in visual dialog

F Chen, D Zhang, X Chen, J Shi, S Xu… - Proceedings of the 30th …, 2022 - dl.acm.org

Visual dialog requires models to give reasonable answers according to a series of coherent
questions and related visual concepts in images. However, most current work either focuses …

Save Cite Cited by 16 Related articles

HVLM: Exploring human-like visual cognition and language-memory network for visual dialog

K Sun, C Guo, H Zhang, Y Li - Information Processing & Management, 2022 - Elsevier

Visual dialog, a visual-language task, enables an AI agent to engage in conversation with
humans grounded in a given image. To generate appropriate answers for a series of …

Save Cite Cited by 12 Related articles All 2 versions Free GPT-4

Learning dual encoding model for adaptive visual understanding in visual dialogue

J Yu, X Jiang, Z Qin, W Zhang, Y Hu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Different from Visual Question Answering task that requires to answer only one question
about an image, Visual Dialogue task involves multiple rounds of dialogues which cover a …

Save Cite Cited by 29 Related articles All 5 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Dmrm: A dual-channel multi-hop reasoning model for visual dialog

Vlp: A survey on vision-language pre-training

X-llm: Bootstrap** advanced large language models by treating multi-modalities as foreign languages

GoG: Relation-aware graph-over-graph network for visual dialog

Improving cross-modal understanding in visual dialog via contrastive learning

The dialog must go on: Improving visual dialog via generative self-training

KBGN: Knowledge-bridge graph network for adaptive vision-text reasoning in visual dialogue

Reasoning with multi-structure commonsense knowledge in visual dialog

Unsupervised and pseudo-supervised vision-language alignment in visual dialog

HVLM: Exploring human-like visual cognition and language-memory network for visual dialog

Learning dual encoding model for adaptive visual understanding in visual dialogue