- Academic Search

Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimoda...

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Speichern Zitieren Zitiert von: 741 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

KS Kalyan - Natural Language Processing Journal, 2024 - Elsevier

Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

Speichern Zitieren Zitiert von: 261 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

Speichern Zitieren Zitiert von: 1086 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] oup.com

A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Speichern Zitieren Zitiert von: 162 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Pengi: An audio language model for audio tasks

S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc

In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …

Speichern Zitieren Zitiert von: 140 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset

S Chen, H Li, Q Wang, Z Zhao… - Advances in Neural …, 2024 - proceedings.neurips.cc

Vision and text have been fully explored in contemporary video-text foundational models,
while other modalities such as audio and subtitles in videos have not received sufficient …

Speichern Zitieren Zitiert von: 105 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …

Speichern Zitieren Zitiert von: 133 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uni-moe: Scaling unified multimodal llms with mixture of experts

Y Li, S Jiang, B Hu, L Wang, W Zhong… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Recent advancements in Multimodal Large Language Models (MLLMs) underscore the
significance of scalable models and data to boost performance, yet this often incurs …

Speichern Zitieren Zitiert von: 23 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Separate anything you describe

X Liu, Q Kong, Y Zhao, H Liu, Y Yuan… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Language-queried audio source separation (LASS) is a new paradigm for computational
auditory scene analysis (CASA). LASS aims to separate a target sound from an audio …

Speichern Zitieren Zitiert von: 40 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cat: Enhancing multimodal large language model to answer questions in dynamic audio-visual scenarios

Q Ye, Z Yu, R Shao, X **e, P Torr, X Cao - European Conference on …, 2024 - Springer

This paper focuses on the challenge of answering questions in scenarios that are composed
of rich and complex dynamic audio-visual components. Although existing Multimodal Large …

Speichern Zitieren Zitiert von: 12 Ähnliche Artikel Alle 4 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimoda...

A comprehensive overview of large language models

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

A survey on multimodal large language models

A Survey of Multimodel Large Language Models

Pengi: An audio language model for audio tasks

Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

Uni-moe: Scaling unified multimodal llms with mixture of experts

Separate anything you describe

Cat: Enhancing multimodal large language model to answer questions in dynamic audio-visual scenarios