الباحث العلمي من Google

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …‏

حفظ اقتباس تم اقتباسها في عدد: 214 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive review of multimodal large language models: Performance and challenges across different tasks‏

J Wang, H Jiang, Y Liu, C Ma, X Zhang, Y Pan… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

In an era defined by the explosive growth of data and rapid technological advancements,
Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence …‏

حفظ اقتباس تم اقتباسها في عدد: 24 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vlmevalkit: An open-source toolkit for evaluating large multi-modality models‏

H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu… - Proceedings of the …, 2024‏ - dl.acm.org‏

We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …‏

حفظ اقتباس تم اقتباسها في عدد: 39 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavllm: Towards robust and adaptive speech large language model‏

S Hu, L Zhou, S Liu, S Chen, L Meng, H Hao… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The recent advancements in large language models (LLMs) have revolutionized the field of
natural language processing, progressively broadening their scope to multimodal …‏

حفظ اقتباس تم اقتباسها في عدد: 48 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llama-omni: Seamless speech interaction with large language models‏

Q Fang, S Guo, Y Zhou, Z Ma, S Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Models like GPT-4o enable real-time interaction with large language models (LLMs) through
speech, significantly enhancing user experience compared to traditional text-based …‏

حفظ اقتباس تم اقتباسها في عدد: 34 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lauragpt: Listen, attend, understand, and regenerate audio with gpt‏

Z Du, J Wang, Q Chen, Y Chu, Z Gao, Z Li, K Hu… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Generative Pre-trained Transformer (GPT) models have achieved remarkable performance
on various natural language processing tasks, and have shown great potential as …‏

حفظ اقتباس تم اقتباسها في عدد: 44 مقالات ذات صلة إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of multimodal large language model from a data-centric perspective‏

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …‏

حفظ اقتباس تم اقتباسها في عدد: 34 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling‏

S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Language models have been effectively applied to modeling natural signals, such as
images, video, speech, and audio. A crucial component of these models is the codec …‏

حفظ اقتباس تم اقتباسها في عدد: 25 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speechverse: A large-scale generalizable audio language model‏

N Das, S Dingliwal, S Ronanki, R Paturi… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Large language models (LLMs) have shown incredible proficiency in performing tasks that
require semantic understanding of natural language instructions. Recently, many works …‏

حفظ اقتباس تم اقتباسها في عدد: 24 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiobench: A universal benchmark for audio large language models‏

B Wang, X Zou, G Lin, S Sun, Z Liu, W Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

We introduce AudioBench, a universal benchmark designed to evaluate Audio Large
Language Models (AudioLLMs). It encompasses 8 distinct tasks and 26 datasets, among …‏

حفظ اقتباس تم اقتباسها في عدد: 18 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models

Mm-llms: Recent advances in multimodal large language models‏

A comprehensive review of multimodal large language models: Performance and challenges across different tasks‏

Vlmevalkit: An open-source toolkit for evaluating large multi-modality models‏

Wavllm: Towards robust and adaptive speech large language model‏

Llama-omni: Seamless speech interaction with large language models‏

Lauragpt: Listen, attend, understand, and regenerate audio with gpt‏

A survey of multimodal large language model from a data-centric perspective‏

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling‏

Speechverse: A large-scale generalizable audio language model‏

Audiobench: A universal benchmark for audio large language models‏