- Academic Search

Y Dang, K Huang, J Huo, Y Yan, S Huang, D Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …

Simpan Kutip Dirujuk 8 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Surveying the mllm landscape: A meta-review of current surveys

M Li, K Chen, Z Bi, M Liu, B Peng, Q Niu, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

The rise of Multimodal Large Language Models (MLLMs) has become a transformative force
in the field of artificial intelligence, enabling machines to process and generate content …

Simpan Kutip Dirujuk 9 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] unimore.it

Safe-CLIP: Removing NSFW concepts from vision-and-language models

S Poppi, T Poppi, F Cocchi, M Cornia, L Baraldi… - … on Computer Vision, 2024 - Springer

Large-scale vision-and-language models, such as CLIP, are typically trained on web-scale
data, which can introduce inappropriate content and lead to the development of unsafe and …

Simpan Kutip Dirujuk 14 kali Artikel terkait

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

S Sarto, M Cornia, L Baraldi, R Cucchiara - European Conference on …, 2024 - Springer

Effectively aligning with human judgment when evaluating machine-generated image
captions represents a complex yet intriguing challenge. Existing evaluation metrics like …

Simpan Kutip Dirujuk 6 kali Artikel terkait 10 versi

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

D Caffagni, F Cocchi, N Moratelli… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal LLMs are the natural evolution of LLMs and enlarge their capabilities so as to
work beyond the pure textual modality. As research is being carried out to design novel …

Simpan Kutip Dirujuk 27 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

Simpan Kutip Dirujuk 5 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Computer audition: From task-specific machine learning to foundation models

A Triantafyllopoulos, I Tsangko, A Gebhard… - arxiv preprint arxiv …, 2024 - arxiv.org

Foundation models (FMs) are increasingly spearheading recent advances on a variety of
tasks that fall under the purview of computer audition--the use of machines to understand …

Simpan Kutip Dirujuk 4 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

N Moratelli, D Caffagni, M Cornia, L Baraldi… - arxiv preprint arxiv …, 2024 - arxiv.org

The conventional training approach for image captioning involves pre-training a network
using teacher forcing and subsequent fine-tuning with Self-Critical Sequence Training to …

Simpan Kutip Dirujuk 4 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Personalizing multimodal large language models for image captioning: an Experimental analysis

D Bucciarelli, N Moratelli, M Cornia, L Baraldi… - arxiv preprint arxiv …, 2024 - arxiv.org

The task of image captioning demands an algorithm to generate natural language
descriptions of visual inputs. Recent advancements have seen a convergence between …

Simpan Kutip Dirujuk 3 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on the memory mechanism of large language model based agents

Z Zhang, X Bo, C Ma, R Li, X Chen, Q Dai, J Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language model (LLM) based agents have recently attracted much attention from the
research and industry communities. Compared with original LLMs, LLM-based agents are …

Simpan Kutip Dirujuk 64 kali Artikel terkait 2 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

The (r) evolution of multimodal large language models: A survey

Explainable and interpretable multimodal large language models: A comprehensive survey

Surveying the mllm landscape: A meta-review of current surveys

Safe-CLIP: Removing NSFW concepts from vision-and-language models

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Computer audition: From task-specific machine learning to foundation models

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

Personalizing multimodal large language models for image captioning: an Experimental analysis

A survey on the memory mechanism of large language model based agents