Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Towards audio language modeling--an overview
Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …
reduce transmission latency. Researchers recently discovered the potential of codecs as …
Foundation models for music: A survey
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
Givt: Generative infinite-vocabulary transformers
Abstract We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate
vector sequences with real-valued entries, instead of discrete tokens from a finite …
vector sequences with real-valued entries, instead of discrete tokens from a finite …
Salm: Speech-augmented language model with in-context learning for speech recognition and translation
We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …
Llms meet multimodal generation and editing: A survey
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …
combining LLMs with multimodal learning. Previous surveys of multimodal large language …
Codec-SUPERB: An in-depth analysis of sound codec models
The sound codec's dual roles in minimizing data transmission latency and serving as
tokenizers underscore its critical importance. Recent years have witnessed significant …
tokenizers underscore its critical importance. Recent years have witnessed significant …
Vall-t: Decoder-only generative transducer for robust and decoding-controllable text-to-speech
Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and
VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot …
VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot …
Autoregressive diffusion transformer for text-to-speech synthesis
Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …
Mini-omni2: Towards open-source gpt-4o with vision, speech and duplex capabilities
GPT-4o, an all-encompassing model, represents a milestone in the development of large
multi-modal language models. It can understand visual, auditory, and textual modalities …
multi-modal language models. It can understand visual, auditory, and textual modalities …
Boosting large language model for speech synthesis: An empirical study
Large language models (LLMs) have made significant advancements in natural language
processing and are concurrently extending the language ability to other modalities, such as …
processing and are concurrently extending the language ability to other modalities, such as …