- Academic Search

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024‏ - dl.acm.org‏

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …‏

שמור צטט צוטט על ידי 1253 מאמרים בנושא זה כל 12 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive overview of large language models‏

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …‏

שמור צטט צוטט על ידי 782 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] tandfonline.com

Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration‏

F Fui-Hoon Nah, R Zheng, J Cai, K Siau… - Journal of information …, 2023‏ - Taylor & Francis‏

Artificial intelligence (AI) has elicited much attention across disciplines and industries (Hyder
et al., 2019). AI has been defined as “a system's ability to correctly interpret external data, to …‏

שמור צטט צוטט על ידי 873 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sharegpt4v: Improving large multi-modal models with better captions‏

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - … on Computer Vision, 2024‏ - Springer‏

Modality alignment serves as the cornerstone for large multi-modal models (LMMs).
However, the impact of different attributes (eg, data type, quality, and scale) of training data …‏

שמור צטט צוטט על ידי 493 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vila: On pre-training for visual language models‏

J Lin, H Yin, W **, P Molchanov… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …‏

שמור צטט צוטט על ידי 300 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants‏

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024‏ - nowpublishers.com‏

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …‏

שמור צטט צוטט על ידי 233 מאמרים בנושא זה כל 7 הגרסאות חיפוש ספריות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Chat-univi: Unified visual representation empowers large language models with image and video understanding‏

P **, R Takanobu, W Zhang… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Large language models have demonstrated impressive universal capabilities across a wide
range of open-ended tasks and have extended their utility to encompass multimodal …‏

שמור צטט צוטט על ידי 181 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Are we on the right way for evaluating large vision-language models?‏

L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …‏

שמור צטט צוטט על ידי 174 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hallucination of multimodal large language models: A survey‏

Z Bai, P Wang, T **ao, T He, Z Han, Z Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …‏

שמור צטט צוטט על ידי 110 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How robust is google's bard to adversarial image attacks?‏

Y Dong, H Chen, J Chen, Z Fang, X Yang… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Multimodal Large Language Models (MLLMs) that integrate text and other modalities
(especially vision) have achieved unprecedented performance in various multimodal tasks …‏

שמור צטט צוטט על ידי 106 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Cheap and quick: Efficient vision-language instruction tuning for large language models

A Survey of Multimodel Large Language Models‏

A comprehensive overview of large language models‏

Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration‏

Sharegpt4v: Improving large multi-modal models with better captions‏

Vila: On pre-training for visual language models‏

Multimodal foundation models: From specialists to general-purpose assistants‏

Chat-univi: Unified visual representation empowers large language models with image and video understanding‏

Are we on the right way for evaluating large vision-language models?‏

Hallucination of multimodal large language models: A survey‏

How robust is google's bard to adversarial image attacks?‏