- Academic Search

Z Song, Z Ma, Y Yang, J Zhuo, X Chen - arxiv preprint arxiv:2412.00721, 2024 - arxiv.org

Large Language Models (LLMs) have showcased exceptional performance across diverse
NLP tasks, and their integration with speech encoder is rapidly emerging as a dominant …

Save Cite Cited by 1 Related articles View as HTML

Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech

W Chen, B Yan, CC Chen… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

A common criticism for current speech recognition benchmarks is the reliance on settings
which do not generalize well to real-world conversational environments, such as read …

Save Cite Related articles

[Free GPT-4]

[PDF] arxiv.org

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

Y Yang, J Zhuo, Z **, Z Ma, X Yang, Z Yao… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-supervised learning (SSL) has achieved great success in speech-related tasks, driven
by advancements in speech encoder architectures and the expansion of datasets. While …

Save Cite Related articles View as HTML

MADD: A Multi-Lingual Multi-Speaker Audio Deepfake Detection Dataset

X Qi, H Gu, J Yi, J Tao, Y Ren, J He… - 2024 IEEE 14th …, 2024 - ieeexplore.ieee.org

AI-driven advancements in speech synthesis and voice conversion, now are able to
convincingly emulate human speech, have made a growing challenge for investigators and …

Save Cite Related articles

Comprehensive Benchmarking and Analysis of Open Pretrained Thai Speech Recognition Models

P Tipakasorn, O Chatthong… - … 27th Conference of …, 2024 - ieeexplore.ieee.org

This paper presents a comprehensive benchmarking and analysis of open pretrained Thai
Automatic Speech Recognition (ASR) models, addressing a critical gap in low-resource …

Save Cite Related articles

Create alert

Cite

Advanced search

Saved to My library

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages...

A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario

Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

MADD: A Multi-Lingual Multi-Speaker Audio Deepfake Detection Dataset

Comprehensive Benchmarking and Analysis of Open Pretrained Thai Speech Recognition Models