الباحث العلمي من Google

Y Deng, P Lu, F Yin, Z Hu, S Shen, Q Gu, J Zou… - ar**‏

GC Kang, J Kim, J Kim, BT Zhang - 2024 IEEE International …, 2024‏ - ieeexplore.ieee.org‏

Interactive Object Gras** (IOG) is the task of identifying and gras** the desired object
via human-robot natural language interaction. Current IOG systems assume that a human …‏

حفظ اقتباس تم اقتباسها في عدد: 4 مقالات ذات صلة الإصدارات الـ 4كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Enabling harmonious human-machine interaction with visual-context augmented dialogue system: A review‏

H Wang, B Guo, Y Zeng, M Chen, Y Ding… - ACM Transactions on …, 2022‏ - dl.acm.org‏

The intelligent dialogue system, aiming at communicating with humans harmoniously with
natural language, is brilliant for promoting the advancement of human-machine interaction …‏

حفظ اقتباس تم اقتباسها في عدد: 3 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Retrieval across any domains via large-scale pre-trained model‏

J Yan, Z Yin, C Xu, C Deng, H Huang - Forty-first International …, 2024‏ - openreview.net‏

In order to enhance the generalization ability towards unseen domains, universal cross-
domain image retrieval methods require a training dataset encompassing diverse domains …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

VD-GR: boosting visual dialog with cascaded spatial-temporal multi-modal graphs‏

A Abdessaied, L Shi, A Bulling - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com‏

We propose VD-GR--a novel visual dialog model that combines pre-trained language
models (LMs) with graph neural networks (GNNs). Prior works mainly focused on one class …‏

حفظ اقتباس تم اقتباسها في عدد: 2 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models‏

B Wen, Z Yang, J Wang, Z Gan, B Howe… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

In this paper, we build a visual dialogue dataset, named InfoVisDial, which provides rich
informative answers in each round even with external knowledge related to the visual …‏

حفظ اقتباس تم اقتباسها في عدد: 1 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data‏

P Kumar, S Malik, B Raman, X Li - arxiv preprint arxiv:2402.07640, 2024‏ - arxiv.org‏

The ability to generate sentiment-controlled feedback in response to multimodal inputs
comprising text and images addresses a critical gap in human-computer interaction. This …‏

حفظ اقتباس تم اقتباسها في عدد: 2 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

The dialog must go on: Improving visual dialog via generative self-training

Enhancing large vision language models with self-training on image comprehension‏

Enabling harmonious human-machine interaction with visual-context augmented dialogue system: A review‏

Retrieval across any domains via large-scale pre-trained model‏

VD-GR: boosting visual dialog with cascaded spatial-temporal multi-modal graphs‏

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models‏

Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data‏