Google Академія

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Зберегти Послатися Цитовано в 1236 джерелах Пов’язані статті Кількість версій: 12

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Зберегти Послатися Цитовано в 763 джерелах Пов’язані статті Кількість версій: 6

[Free GPT-4]
[DeepSeek]

[PDF] zhjwpku.com

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com

Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Зберегти Послатися Цитовано в 3771 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mmbench: Is your multi-modal model an all-around player?

Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao… - European conference on …, 2024 - Springer

Large vision-language models (VLMs) have recently achieved remarkable progress,
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …

Зберегти Послатися Цитовано в 789 джерелах Пов’язані статті Кількість версій: 9

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arxiv preprint arxiv …, 2023 - arxiv.org

While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

Зберегти Послатися Цитовано в 984 джерелах Пов’язані статті Кількість версій: 2 Кеш

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Next-gpt: Any-to-any multimodal llm

S Wu, H Fei, L Qu, W Ji, TS Chua - Forty-first International …, 2024 - openreview.net

While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides,
they mostly fall prey to the limitation of only input-side multimodal understanding, without the …

Зберегти Послатися Цитовано в 511 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video-llama: An instruction-tuned audio-visual language model for video understanding

H Zhang, X Li, L Bing - arxiv preprint arxiv:2306.02858, 2023 - arxiv.org

We present Video-LLaMA a multi-modal framework that empowers Large Language Models
(LLMs) with the capability of understanding both visual and auditory content in the video …

Зберегти Послатися Цитовано в 800 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Зберегти Послатися Цитовано в 229 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video-chatgpt: Towards detailed video understanding via large vision and language models

M Maaz, H Rasheed, S Khan, FS Khan - arxiv preprint arxiv:2306.05424, 2023 - arxiv.org

Conversation agents fueled by Large Language Models (LLMs) are providing a new way to
interact with visual data. While there have been initial attempts for image-based …

Зберегти Послатися Цитовано в 591 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Generative multimodal models are in-context learners

Q Sun, Y Cui, X Zhang, F Zhang, Q Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Humans can easily solve multimodal tasks in context with only a few demonstrations or
simple instructions which current multimodal systems largely struggle to imitate. In this work …

Зберегти Послатися Цитовано в 208 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

Послатися

Розширений пошук

Збережено в моїй бібліотеці

A Survey of Multimodel Large Language Models

The rise and potential of large language model based agents: A survey

[PDF][PDF] A survey of large language models

Mmbench: Is your multi-modal model an all-around player?

Siren's song in the AI ocean: a survey on hallucination in large language models

Next-gpt: Any-to-any multimodal llm

Video-llama: An instruction-tuned audio-visual language model for video understanding

Multimodal foundation models: From specialists to general-purpose assistants

Video-chatgpt: Towards detailed video understanding via large vision and language models

Generative multimodal models are in-context learners