Google Академик

Z Lin, S Guan, W Zhang, H Zhang, Y Li… - Artificial Intelligence …, 2024 - Springer

Recently, large language models (LLMs) have attracted considerable attention due to their
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …

Сачувај Цитирај 31 пута наведен Сродни чланци Све верзије (5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Сачувај Цитирај 35 пута наведен Сродни чланци Све верзије (4) HTML верзија

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arxiv preprint arxiv …, 2023 - arxiv.org

While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

Сачувај Цитирај 993 пута наведен Сродни чланци Све верзије (2) Кеширано

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video-llama: An instruction-tuned audio-visual language model for video understanding

H Zhang, X Li, L Bing - arxiv preprint arxiv:2306.02858, 2023 - arxiv.org

We present Video-LLaMA a multi-modal framework that empowers Large Language Models
(LLMs) with the capability of understanding both visual and auditory content in the video …

Сачувај Цитирај 809 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arxiv preprint arxiv …, 2023 - arxiv.org

The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

Сачувај Цитирај 157 пута наведен Сродни чланци Све верзије (9) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavllm: Towards robust and adaptive speech large language model

S Hu, L Zhou, S Liu, S Chen, L Meng, H Hao… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent advancements in large language models (LLMs) have revolutionized the field of
natural language processing, progressively broadening their scope to multimodal …

Сачувај Цитирај 52 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llama-omni: Seamless speech interaction with large language models

Q Fang, S Guo, Y Zhou, Z Ma, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Models like GPT-4o enable real-time interaction with large language models (LLMs) through
speech, significantly enhancing user experience compared to traditional text-based …

Сачувај Цитирај 44 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Connecting speech encoder and large language model for asr

W Yu, C Tang, G Sun, X Chen, T Tan… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

The impressive capability and versatility of large language models (LLMs) have aroused
increasing attention in automatic speech recognition (ASR), with several pioneering studies …

Сачувај Цитирај 49 пута наведен Сродни чланци Све верзије (3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Z Chen, H Huang, A Andrusenko… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …

Сачувај Цитирај 37 пута наведен Сродни чланци Све верзије (5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Advancing large language models to capture varied speaking styles and respond properly in spoken conversations

GT Lin, CH Chiang, H Lee - arxiv preprint arxiv:2402.12786, 2024 - arxiv.org

In spoken dialogue, even if two current turns are the same sentence, their responses might
still differ when they are spoken in different styles. The spoken styles, containing …

Сачувај Цитирај 19 пута наведен Сродни чланци Све верзије (6) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

On decoder-only architecture for speech-to-text and large language model integration

Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models

Sparks of large audio models: A survey and outlook

Siren's song in the AI ocean: a survey on hallucination in large language models

Video-llama: An instruction-tuned audio-visual language model for video understanding

Listen, think, and understand

Wavllm: Towards robust and adaptive speech large language model

Llama-omni: Seamless speech interaction with large language models

Connecting speech encoder and large language model for asr

Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Advancing large language models to capture varied speaking styles and respond properly in spoken conversations