- Academic Search

X Mei, X Liu, MD Plumbley, W Wang - … journal on audio, speech, and music …, 2022 - Springer

Automated audio captioning is a cross-modal translation task that aims to generate natural
language descriptions for given audio clips. This task has received increasing attention with …

Simpan Kutip Dirujuk 60 kali Artikel terkait 11 versi

[Free GPT-4]

[PDF] arxiv.org

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arxiv preprint arxiv …, 2023 - arxiv.org

The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

Simpan Kutip Dirujuk 146 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Valor: Vision-audio-language omni-perception pretraining model and dataset

S Chen, X He, L Guo, X Zhu, W Wang, J Tang… - arxiv preprint arxiv …, 2023 - arxiv.org

In this paper, we propose a Vision-Audio-Language Omni-peRception pretraining model
(VALOR) for multi-modal understanding and generation. Different from widely-studied vision …

Simpan Kutip Dirujuk 96 kali Artikel terkait 4 versi Versi HTML

Beyond the status quo: A contemporary survey of advances and challenges in audio captioning

X Xu, Z **e, M Wu, K Yu - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

Automated audio captioning (AAC), a task that mimics human perception as well as
innovatively links audio processing and natural language processing, has overseen much …

Simpan Kutip Dirujuk 15 kali Artikel terkait 2 versi

[Free GPT-4]

[PDF] arxiv.org

Audio captioning transformer

X Mei, X Liu, Q Huang, MD Plumbley… - arxiv preprint arxiv …, 2021 - arxiv.org

Audio captioning aims to automatically generate a natural language description of an audio
clip. Most captioning models follow an encoder-decoder architecture, where the decoder …

Simpan Kutip Dirujuk 94 kali Artikel terkait 9 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Prefix tuning for automated audio captioning

M Kim, K Sung-Bin, TH Oh - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Audio captioning aims to generate text descriptions from environmental sounds. One
challenge of audio captioning is the difficulty of the generalization due to the lack of audio …

Simpan Kutip Dirujuk 47 kali Artikel terkait 3 versi

Valor: Vision-audio-language omni-perception pretraining model and dataset

J Liu, S Chen, X He, L Guo, X Zhu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In this paper, we propose the Vision-Audio-Language Omni-peRception pretraining model
(VALOR) for multimodal understanding and generation. Unlike widely-studied vision …

Simpan Kutip Dirujuk 6 kali Artikel terkait 6 versi

[Free GPT-4]

[PDF] arxiv.org

Investigating local and global information for automated audio captioning with transfer learning

X Xu, H Dinkel, M Wu, Z **e, K Yu - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Automated audio captioning (AAC) aims at generating summarizing descriptions for audio
clips. Multitudinous concepts are described in an audio caption, ranging from local …

Simpan Kutip Dirujuk 68 kali Artikel terkait 7 versi

[Free GPT-4]

[PDF] arxiv.org

An encoder-decoder based audio captioning system with transfer and reinforcement learning

X Mei, Q Huang, X Liu, G Chen, J Wu, Y Wu… - arxiv preprint arxiv …, 2021 - arxiv.org

Automated audio captioning aims to use natural language to describe the content of audio
data. This paper presents an audio captioning system with an encoder-decoder architecture …

Simpan Kutip Dirujuk 53 kali Artikel terkait 8 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Unified model for image, video, audio and language tasks

M Shukor, C Dancette, A Rame, M Cord - arxiv preprint arxiv:2307.16184, 2023 - arxiv.org

Large Language Models (LLMs) have made the ambitious quest for generalist agents
significantly far from being a fantasy. A key hurdle for building such general models is the …

Simpan Kutip Dirujuk 16 kali Artikel terkait 2 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Audio Captioning Based on Transformer and Pre-Trained CNN.

Automated audio captioning: An overview of recent progress and new challenges

Listen, think, and understand

Valor: Vision-audio-language omni-perception pretraining model and dataset

Beyond the status quo: A contemporary survey of advances and challenges in audio captioning

Audio captioning transformer

Prefix tuning for automated audio captioning

Valor: Vision-audio-language omni-perception pretraining model and dataset

Investigating local and global information for automated audio captioning with transfer learning

An encoder-decoder based audio captioning system with transfer and reinforcement learning

Unified model for image, video, audio and language tasks