الباحث العلمي من Google

P Janowczyk, L Laurier, A Giulietta, A Octavia… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Multi-Modal Language Models (MLLMs) have transformed artificial intelligence by
combining visual and text data, making applications like image captioning, visual question …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding‏

B Zhang, K Li, Z Cheng, Z Hu, Y Yuan, G Chen… - arxiv preprint arxiv …, 2025‏ - arxiv.org‏

In this paper, we propose VideoLLaMA3, a more advanced multimodal foundation model for
image and video understanding. The core design philosophy of VideoLLaMA3 is vision …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality‏

S Jiang, J Liang, M Liu, B Qin - arxiv preprint arxiv:2412.11694, 2024‏ - arxiv.org‏

From the Specific-MLLM, which excels in single-modal tasks, to the Omni-MLLM, which
extends the range of general modalities, this evolution aims to achieve understanding and …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding‏

J Li, J Zhang, Z Jie, L Ma, G Li - arxiv preprint arxiv:2501.01926, 2025‏ - arxiv.org‏

Large vision-language models (LVLMs) have shown remarkable capabilities in visual-
language understanding for downstream multi-modal tasks. Despite their success, LVLMs …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

The curse of multi-modalities: Evaluating hallucinations of large multimodal models across...

Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models‏

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding‏

From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality‏

Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding‏