Sparks of large audio models: A survey and outlook
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …
challenges in applying large language models to the field of audio signal processing. Audio …
Acoustic scene classification: a comprehensive survey
Acoustic scene classification (ASC) has gained significant interest recently due to its diverse
applications. Various audio signal processing and machine learning methods have been …
applications. Various audio signal processing and machine learning methods have been …
Audioldm: Text-to-audio generation with latent diffusion models
Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general
audio based on text descriptions. However, previous studies in TTA have limited generation …
audio based on text descriptions. However, previous studies in TTA have limited generation …
On the use of AI-based tools like ChatGPT to support management research
Purpose The article discusses the current relevance of artificial intelligence (AI) in research
and how AI improves various research methods. This article focuses on the practical case …
and how AI improves various research methods. This article focuses on the practical case …
Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation
Contrastive learning has shown remarkable success in the field of multimodal
representation learning. In this paper, we propose a pipeline of contrastive language-audio …
representation learning. In this paper, we propose a pipeline of contrastive language-audio …
Clap learning audio concepts from natural language supervision
Mainstream machine listening models are trained to learn audio concepts under the
paradigm of one class label to many recordings focusing on one task. Learning under such …
paradigm of one class label to many recordings focusing on one task. Learning under such …
Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research
The advancement of audio-language (AL) multimodal learning tasks has been significant in
recent years, yet the limited size of existing audio-language datasets poses challenges for …
recent years, yet the limited size of existing audio-language datasets poses challenges for …
Masked autoencoders that listen
This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-
supervised representation learning from audio spectrograms. Following the Transformer …
supervised representation learning from audio spectrograms. Following the Transformer …
Dawn of the transformer era in speech emotion recognition: closing the valence gap
Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …
machine learning tasks. In the audio domain, such architectures have been successfully …
Beats: Audio pre-training with acoustic tokenizers
The massive growth of self-supervised learning (SSL) has been witnessed in language,
vision, speech, and audio domains over the past few years. While discrete label prediction is …
vision, speech, and audio domains over the past few years. While discrete label prediction is …