Edge computing on IoT for machine signal processing and fault diagnosis: A review

S Lu, J Lu, K An, X Wang, Q He - IEEE Internet of Things …, 2023 - ieeexplore.ieee.org
Edge computing is an emerging paradigm that offloads the computations and analytics
workloads onto the Internet of Things (IoT) edge devices to accelerate the computation …

Towards audio language modeling--an overview

H Wu, X Chen, YC Lin, K Chang, HL Chung… - arxiv preprint arxiv …, 2024 - arxiv.org
Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …

Simple and controllable music generation

J Copet, F Kreuk, I Gat, T Remez… - Advances in …, 2023 - proceedings.neurips.cc
We tackle the task of conditional music generation. We introduce MusicGen, a single
Language Model (LM) that operates over several streams of compressed discrete music …

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2023 - proceedings.neurips.cc
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

Symphonize 3d semantic scene completion with contextual instance queries

H Jiang, T Cheng, N Gao, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract 3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal
undertaking in autonomous driving aiming to predict the voxel occupancy within volumetric …

Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models

Y Chu, J Xu, X Zhou, Q Yang, S Zhang, Z Yan… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, instruction-following audio-language models have received broad attention for
audio interaction with humans. However, the absence of pre-trained audio models capable …

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Audiogpt: Understanding and generating speech, music, sound, and talking head

R Huang, M Li, D Yang, J Shi, X Chang, Z Ye… - Proceedings of the …, 2024 - ojs.aaai.org
Large language models (LLMs) have exhibited remarkable capabilities across a variety of
domains and tasks, challenging our understanding of learning and cognition. Despite the …