Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models

Z Lin, S Guan, W Zhang, H Zhang, Y Li… - Artificial Intelligence …, 2024 - Springer
Recently, large language models (LLMs) have attracted considerable attention due to their
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arxiv preprint arxiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

Video-llama: An instruction-tuned audio-visual language model for video understanding

H Zhang, X Li, L Bing - arxiv preprint arxiv:2306.02858, 2023 - arxiv.org
We present Video-LLaMA a multi-modal framework that empowers Large Language Models
(LLMs) with the capability of understanding both visual and auditory content in the video …

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arxiv preprint arxiv …, 2023 - arxiv.org
The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

Wavllm: Towards robust and adaptive speech large language model

S Hu, L Zhou, S Liu, S Chen, L Meng, H Hao… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent advancements in large language models (LLMs) have revolutionized the field of
natural language processing, progressively broadening their scope to multimodal …

Llama-omni: Seamless speech interaction with large language models

Q Fang, S Guo, Y Zhou, Z Ma, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Models like GPT-4o enable real-time interaction with large language models (LLMs) through
speech, significantly enhancing user experience compared to traditional text-based …

Connecting speech encoder and large language model for asr

W Yu, C Tang, G Sun, X Chen, T Tan… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
The impressive capability and versatility of large language models (LLMs) have aroused
increasing attention in automatic speech recognition (ASR), with several pioneering studies …

Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Z Chen, H Huang, A Andrusenko… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …

Advancing large language models to capture varied speaking styles and respond properly in spoken conversations

GT Lin, CH Chiang, H Lee - arxiv preprint arxiv:2402.12786, 2024 - arxiv.org
In spoken dialogue, even if two current turns are the same sentence, their responses might
still differ when they are spoken in different styles. The spoken styles, containing …