Computer vision-based cybernetics systems for promoting modern poultry farming: a critical review

X Yang, RB Bist, B Paneru, T Liu, T Applegate… - … and Electronics in …, 2024 - Elsevier
As global demands on the poultry production and welfare both intensify, the precision
poultry farming technologies such as computer vision-based cybernetics system is …

Recent advances in speech language models: A survey

W Cui, D Yu, X Jiao, Z Meng, G Zhang, Q Wang… - ar** Language-Speech Pre-training via Knowledge Distillation
C Wang, M Liao, Z Huang, J Zhang - arxiv preprint arxiv:2405.19041, 2024 - arxiv.org
Recent end-to-end approaches have shown promise in extending large language models
(LLMs) to speech inputs, but face limitations in directly assessing and optimizing alignment …

Salsa: Speedy asr-llm synchronous aggregation

A Mittal, D Prabhu, S Sarawagi, P Jyothi - arxiv preprint arxiv:2408.16542, 2024 - arxiv.org
Harnessing pre-trained LLMs to improve ASR systems, particularly for low-resource
languages, is now an emerging area of research. Existing methods range from using LLMs …

Roadmap towards superhuman speech understanding using large language models

F Bu, Y Zhang, X Wang, B Wang, Q Liu, H Li - arxiv preprint arxiv …, 2024 - arxiv.org
The success of large language models (LLMs) has prompted efforts to integrate speech and
audio data, aiming to create general foundation models capable of processing both textual …

Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

S Wang, CHH Yang, J Wu, C Zhang - arxiv preprint arxiv:2404.14716, 2024 - arxiv.org
Large language models (LLMs) can adapt to new tasks through in-context learning (ICL)
based on a few examples presented in dialogue history without any model parameter …

Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models

CY Kuan, WP Huang, H Lee - arxiv preprint arxiv:2406.08402, 2024 - arxiv.org
Large audio-language models (LALMs) enhance traditional large language models by
integrating audio perception capabilities, allowing them to tackle audio-related tasks …

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

TK Lam, M Gaido, S Papi, L Bentivogli… - arxiv preprint arxiv …, 2025 - arxiv.org
Following the remarkable success of Large Language Models (LLMs) in NLP tasks, there is
increasing interest in extending their capabilities to speech--the most common form in …