A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario

Z Song, Z Ma, Y Yang, J Zhuo, X Chen - arxiv preprint arxiv:2412.00721, 2024 - arxiv.org
Large Language Models (LLMs) have showcased exceptional performance across diverse
NLP tasks, and their integration with speech encoder is rapidly emerging as a dominant …

Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech

W Chen, B Yan, CC Chen… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
A common criticism for current speech recognition benchmarks is the reliance on settings
which do not generalize well to real-world conversational environments, such as read …

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

Y Yang, J Zhuo, Z **, Z Ma, X Yang, Z Yao… - arxiv preprint arxiv …, 2024 - arxiv.org
Self-supervised learning (SSL) has achieved great success in speech-related tasks, driven
by advancements in speech encoder architectures and the expansion of datasets. While …

MADD: A Multi-Lingual Multi-Speaker Audio Deepfake Detection Dataset

X Qi, H Gu, J Yi, J Tao, Y Ren, J He… - 2024 IEEE 14th …, 2024 - ieeexplore.ieee.org
AI-driven advancements in speech synthesis and voice conversion, now are able to
convincingly emulate human speech, have made a growing challenge for investigators and …

Comprehensive Benchmarking and Analysis of Open Pretrained Thai Speech Recognition Models

P Tipakasorn, O Chatthong… - … 27th Conference of …, 2024 - ieeexplore.ieee.org
This paper presents a comprehensive benchmarking and analysis of open pretrained Thai
Automatic Speech Recognition (ASR) models, addressing a critical gap in low-resource …