Google Наука

Exploring the reasoning abilities of multimodal large language models (mllms): A comprehensive survey on emerging trends in multimodal reasoning

Y Wang, W Chen, X Han, X Lin, H Zhao, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract
reasoning ability is the goal of next-generation AI. Recent advancements in Large Language …

Запазване Позоваване С позовавания в 34 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity

Y Liu, Y Cao, Z Gao, W Wang, Z Chen, W Wang… - Science China …, 2024 - Springer

Despite the effectiveness of vision-language supervised fine-tuning in enhancing the
performance of vision large language models (VLLMs), existing visual instruction tuning …

Запазване Позоваване С позовавания в 17 Сродни статии Всички 4 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Investigating and mitigating the multimodal hallucination snowballing in large vision-language models

W Zhong, X Feng, L Zhao, Q Li, L Huang, Y Gu… - arxiv preprint arxiv …, 2024 - arxiv.org

Though advanced in understanding visual information with human languages, Large Vision-
Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is …

Запазване Позоваване С позовавания в 8 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems

S Sun, S Zhu - arxiv preprint arxiv:2405.11629, 2024 - arxiv.org

Numerous studies on adversarial attacks targeting self-driving policies fail to incorporate
realistic-looking adversarial objects, limiting real-world applicability. Building upon prior …

Запазване Позоваване Сродни статии Всички 2 версии Във вид на HTML

Fine Tuning Idefic 9b With LORA for Multimodal Medical VQA

S Punneshetty, S Ashok… - 2024 International …, 2024 - ieeexplore.ieee.org

We introduce an innovative approach to Medical Visual Question Answering (VQA) by fine-
tuning the Idefic 9b model on a combined dataset from SLAKE and VQARAD. Our …

Запазване Позоваване Сродни статии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Intr oducing DEFICS: An Open Reproduction of State-of-the-Art Visual Language Model

Exploring the reasoning abilities of multimodal large language models (mllms): A comprehensive survey on emerging trends in multimodal reasoning

Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity

Investigating and mitigating the multimodal hallucination snowballing in large vision-language models

Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems

Fine Tuning Idefic 9b With LORA for Multimodal Medical VQA