Computational bioacoustics with deep learning: a review and roadmap
D Stowell - PeerJ, 2022 - peerj.com
Animal vocalisations and natural soundscapes are fascinating objects of study, and contain
valuable evidence about animal behaviours, populations and ecosystems. They are studied …
valuable evidence about animal behaviours, populations and ecosystems. They are studied …
Acoustic scene classification: a comprehensive survey
Acoustic scene classification (ASC) has gained significant interest recently due to its diverse
applications. Various audio signal processing and machine learning methods have been …
applications. Various audio signal processing and machine learning methods have been …
Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research
The advancement of audio-language (AL) multimodal learning tasks has been significant in
recent years, yet the limited size of existing audio-language datasets poses challenges for …
recent years, yet the limited size of existing audio-language datasets poses challenges for …
Hts-at: A hierarchical token-semantic audio transformer for sound classification and detection
Audio classification is an important task of map** audio samples into their corresponding
labels. Recently, the transformer model with self-attention mechanisms has been adopted in …
labels. Recently, the transformer model with self-attention mechanisms has been adopted in …
Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms
This report introduces FunAudioLLM, a model family designed to enhance natural voice
interactions between humans and large language models (LLMs). At its core are two …
interactions between humans and large language models (LLMs). At its core are two …
Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation
Crowdsourcing is a popular tool for collecting large amounts of annotated data, but the
specific format of the strong labels necessary for sound event detection is not easily …
specific format of the strong labels necessary for sound event detection is not easily …
Sound-guided semantic image manipulation
The recent success of the generative model shows that leveraging the multi-modal
embedding space can manipulate an image using text information. However, manipulating …
embedding space can manipulate an image using text information. However, manipulating …
Automated audio captioning: An overview of recent progress and new challenges
Automated audio captioning is a cross-modal translation task that aims to generate natural
language descriptions for given audio clips. This task has received increasing attention with …
language descriptions for given audio clips. This task has received increasing attention with …
Acoustic emission and artificial intelligence procedure for crack source localization
The acoustic emission (AE) technique is one of the most widely used in the field of structural
monitoring. Its popularity mainly stems from the fact that it belongs to the category of non …
monitoring. Its popularity mainly stems from the fact that it belongs to the category of non …
MI: Multi-modal Models Membership Inference
With the development of machine learning techniques, the attention of research has been
moved from single-modal learning to multi-modal learning, as real-world data exist in the …
moved from single-modal learning to multi-modal learning, as real-world data exist in the …