Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Voicebox: Text-guided multilingual universal speech generation at scale
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …
community. These models not only generate high fidelity outputs, but are also generalists …
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …
between any two languages? While recent breakthroughs in text-based models have …
Textually pretrained speech language models
Speech language models (SpeechLMs) process and generate acoustic data only, without
textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using …
textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using …
Textless speech-to-speech translation on real data
We present a textless speech-to-speech translation (S2ST) system that can translate speech
from one language into another language and can be built without the need of any text data …
from one language into another language and can be built without the need of any text data …
Espnet2-tts: Extending the edge of tts research
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.
ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …
ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …
Improving grammatical error correction with multimodal feature integration
Grammatical error correction (GEC) is a promising task aimed at correcting errors in a text.
Many methods have been proposed to facilitate this task with remarkable results. However …
Many methods have been proposed to facilitate this task with remarkable results. However …
Speaking style conversion in the waveform domain using discrete self-supervised units
We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and
timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice …
timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice …
Phonetic analysis of self-supervised representations of english speech
We present an analysis of discrete units discovered via selfsupervised representation
learning on English speech. We focus on units produced by a pre-trained HuBERT model …
learning on English speech. We focus on units produced by a pre-trained HuBERT model …
A holistic cascade system, benchmark, and human evaluation protocol for expressive speech-to-speech translation
Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of
source speech to target speech while maintaining translation accuracy. Existing research in …
source speech to target speech while maintaining translation accuracy. Existing research in …
Scaling properties of speech language models
Speech Language Models (SLMs) aim to learn language from raw audio, without textual
resources. Despite significant advances, our current models exhibit weak syntax and …
resources. Despite significant advances, our current models exhibit weak syntax and …