A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
Large Language Models (LLMs) have showcased exceptional performance across diverse
NLP tasks, and their integration with speech encoder is rapidly emerging as a dominant …
NLP tasks, and their integration with speech encoder is rapidly emerging as a dominant …
Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech
A common criticism for current speech recognition benchmarks is the reliance on settings
which do not generalize well to real-world conversational environments, such as read …
which do not generalize well to real-world conversational environments, such as read …
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning
Self-supervised learning (SSL) has achieved great success in speech-related tasks, driven
by advancements in speech encoder architectures and the expansion of datasets. While …
by advancements in speech encoder architectures and the expansion of datasets. While …
MADD: A Multi-Lingual Multi-Speaker Audio Deepfake Detection Dataset
AI-driven advancements in speech synthesis and voice conversion, now are able to
convincingly emulate human speech, have made a growing challenge for investigators and …
convincingly emulate human speech, have made a growing challenge for investigators and …
Comprehensive Benchmarking and Analysis of Open Pretrained Thai Speech Recognition Models
P Tipakasorn, O Chatthong… - … 27th Conference of …, 2024 - ieeexplore.ieee.org
This paper presents a comprehensive benchmarking and analysis of open pretrained Thai
Automatic Speech Recognition (ASR) models, addressing a critical gap in low-resource …
Automatic Speech Recognition (ASR) models, addressing a critical gap in low-resource …