- Academic Search

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Lagre Referanse Sitert av 1236 Beslektede artikler Alle 12 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Lagre Referanse Sitert av 617 Beslektede artikler Alle 4 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qwen technical report

J Bai, S Bai, Y Chu, Z Cui, K Dang, X Deng… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have revolutionized the field of artificial intelligence,
enabling natural language processing tasks that were previously thought to be exclusive to …

Lagre Referanse Sitert av 2482 Beslektede artikler Alle 6 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Cogvlm: Visual expert for pretrained language models

W Wang, Q Lv, W Yu, W Hong, J Qi… - Advances in …, 2025 - proceedings.neurips.cc

We introduce CogVLM, a powerful open-source visual language foundation model. Different
from the popular\emph {shallow alignment} method which maps image features into the …

Lagre Referanse Sitert av 582 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Next-gpt: Any-to-any multimodal llm

S Wu, H Fei, L Qu, W Ji, TS Chua - Forty-first International …, 2024 - openreview.net

While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides,
they mostly fall prey to the limitation of only input-side multimodal understanding, without the …

Lagre Referanse Sitert av 511 Beslektede artikler Alle 6 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2023 - proceedings.neurips.cc

Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

Lagre Referanse Sitert av 445 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video-llama: An instruction-tuned audio-visual language model for video understanding

H Zhang, X Li, L Bing - arxiv preprint arxiv:2306.02858, 2023 - arxiv.org

We present Video-LLaMA a multi-modal framework that empowers Large Language Models
(LLMs) with the capability of understanding both visual and auditory content in the video …

Lagre Referanse Sitert av 800 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating object hallucination in large vision-language models

Y Li, Y Du, K Zhou, J Wang, WX Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org

Inspired by the superior language abilities of large language models (LLM), large vision-
language models (LVLM) have been recently explored by integrating powerful LLMs for …

Lagre Referanse Sitert av 795 Beslektede artikler Alle 6 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu… - arxiv preprint arxiv …, 2023 - arxiv.org

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …

Lagre Referanse Sitert av 742 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

Lagre Referanse Sitert av 594 Beslektede artikler Alle 4 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence...

A Survey of Multimodel Large Language Models

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Qwen technical report

Cogvlm: Visual expert for pretrained language models

Next-gpt: Any-to-any multimodal llm

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

Video-llama: An instruction-tuned audio-visual language model for video understanding

Evaluating object hallucination in large vision-language models

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)