Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[HTML][HTML] Thank you for attention: A survey on attention-based artificial neural networks for automatic speech recognition
Attention is a very popular and effective mechanism in artificial neural network-based
sequence-to-sequence models. In this survey paper, a comprehensive review of the different …
sequence-to-sequence models. In this survey paper, a comprehensive review of the different …
Libriheavy: A 50,000 hours ASR corpus with punctuation casing and context
In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours
of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is …
of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is …
An embarrassingly simple approach for LLM with strong ASR capacity
In this paper, we focus on solving one of the most important tasks in the field of speech
processing, ie, automatic speech recognition (ASR), with speech foundation encoders and …
processing, ie, automatic speech recognition (ASR), with speech foundation encoders and …
Towards universal speech discrete tokens: A case study for asr and tts
Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into
utilizing discrete tokens for speech tasks like recognition and translation, which offer lower …
utilizing discrete tokens for speech tasks like recognition and translation, which offer lower …
Vall-t: Decoder-only generative transducer for robust and decoding-controllable text-to-speech
Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and
VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot …
VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot …
Exploring the capability of mamba in speech applications
This paper explores the capability of Mamba, a recently proposed architecture based on
state space models (SSMs), as a competitive alternative to Transformer-based models. In …
state space models (SSMs), as a competitive alternative to Transformer-based models. In …
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
The evolution of speech technology has been spurred by the rapid increase in dataset sizes.
Traditional speech models generally depend on a large amount of labeled training data …
Traditional speech models generally depend on a large amount of labeled training data …
PromptASR for contextualized ASR with controllable style
Prompts are crucial to large language models as they provide context information such as
topic or logical relationships. Inspired by this, we propose PromptASR, a framework that …
topic or logical relationships. Inspired by this, we propose PromptASR, a framework that …
Spontaneous style text-to-speech synthesis with controllable spontaneous behaviors based on language models
Spontaneous style speech synthesis, which aims to generate human-like speech, often
encounters challenges due to the scarcity of high-quality data and limitations in model …
encounters challenges due to the scarcity of high-quality data and limitations in model …
LibriheavyMix: a 20,000-hour dataset for single-channel reverberant multi-talker speech separation, ASR and speaker diarization
The evolving speech processing landscape is increasingly focused on complex scenarios
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …