Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Towards audio language modeling--an overview
Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …
reduce transmission latency. Researchers recently discovered the potential of codecs as …
Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers
This paper introduces VALL-E 2, the latest advancement in neural codec language models
that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity …
that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity …
Funcodec: A fundamental, reproducible and integrable open-source toolkit for neural speech codec
This paper presents FunCodec, a fundamental neural speech codec toolkit, which is an
extension of the open-source speech processing toolkit FunASR. FunCodec provides …
extension of the open-source speech processing toolkit FunASR. FunCodec provides …
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling
Language models have been effectively applied to modeling natural signals, such as
images, video, speech, and audio. A crucial component of these models is the codec …
images, video, speech, and audio. A crucial component of these models is the codec …
Codec-superb@ slt 2024: A lightweight benchmark for neural audio codec models
Neural audio codec models are becoming increasingly important as they serve as
tokenizers for audio, enabling efficient transmission or facilitating speech language …
tokenizers for audio, enabling efficient transmission or facilitating speech language …
Advancing large language models to capture varied speaking styles and respond properly in spoken conversations
In spoken dialogue, even if two current turns are the same sentence, their responses might
still differ when they are spoken in different styles. The spoken styles, containing …
still differ when they are spoken in different styles. The spoken styles, containing …
F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching
This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on
flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as …
flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as …
Repcodec: A speech representation codec for speech tokenization
With recent rapid growth of large language models (LLMs), discrete speech tokenization has
played an important role for injecting speech into LLMs. However, this discretization gives …
played an important role for injecting speech into LLMs. However, this discretization gives …
Codec-SUPERB: An in-depth analysis of sound codec models
The sound codec's dual roles in minimizing data transmission latency and serving as
tokenizers underscore its critical importance. Recent years have witnessed significant …
tokenizers underscore its critical importance. Recent years have witnessed significant …
APCodec: A neural audio codec with parallel amplitude and phase spectrum encoding and decoding
This paper introduces a novel neural audio codec targeting high waveform sampling rates
and low bitrates named APCodec, which seamlessly integrates the strengths of parametric …
and low bitrates named APCodec, which seamlessly integrates the strengths of parametric …