Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Speak foreign languages with your own voice: Cross-lingual neural codec language modeling
We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual
speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec …
speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec …
Reproducing whisper-style training using an open-source toolkit and publicly available data
Pre-training speech models on large volumes of data has achieved remarkable success.
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …
End-to-end speech-to-text translation: A survey
N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …
one language to text in another language. It finds its application in various domains, such as …
M3ST: Mix at Three Levels for Speech Translation
How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's
well known that data augmentation is an efficient method to improve performance for many …
well known that data augmentation is an efficient method to improve performance for many …
Findings of the IWSLT 2023 evaluation campaign
This paper reports on the shared tasks organized by the 20th IWSLT Conference. The
shared tasks address 9 scientific challenges in spoken language translation: simultaneous …
shared tasks address 9 scientific challenges in spoken language translation: simultaneous …
Speech translation with large language models: An industrial practice
Given the great success of large language models (LLMs) across various tasks, in this
paper, we introduce LLM-ST, a novel and effective speech translation model constructed …
paper, we introduce LLM-ST, a novel and effective speech translation model constructed …
Vec-tok speech: speech vectorization and tokenization for neural speech generation
Language models (LMs) have recently flourished in natural language processing and
computer vision, generating high-fidelity texts or images in various tasks. In contrast, the …
computer vision, generating high-fidelity texts or images in various tasks. In contrast, the …
On the effects of heterogeneous data sources on speech-to-text foundation models
The Open Whisper-style Speech Model (OWSM) series was introduced to achieve full
transparency in building advanced speech-to-text (S2T) foundation models. To this end …
transparency in building advanced speech-to-text (S2T) foundation models. To this end …
[PDF][PDF] LAMASSU: A streaming language-agnostic multilingual speech recognition and translation model using neural transducers
Automatic speech recognition (ASR) and speech translation (ST) can both use neural
transducers as the model structure. It is thus possible to use a single transducer model to …
transducers as the model structure. It is thus possible to use a single transducer model to …
A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models
With the rise of Speech Large Language Models (Speech LLMs), there has been growing
interest in discrete speech tokens for their ability to integrate with text-based tokens …
interest in discrete speech tokens for their ability to integrate with text-based tokens …