Študovňa Google

Ssast: Self-supervised audio spectrogram transformer

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

S Zhang, Y Yang, C Chen, X Zhang, Q Leng… - Expert Systems with …, 2024 - Elsevier

Emotion recognition has recently attracted extensive interest due to its significant
applications to human–computer interaction. The expression of human emotion depends on …

Uložiť Citovať Citované 95-krát Súvisiace články Všetky verzie 2

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Battery safety: Machine learning-based prognostics

J Zhao, X Feng, Q Pang, M Fowler, Y Lian… - Progress in Energy and …, 2024 - Elsevier

Lithium-ion batteries play a pivotal role in a wide range of applications, from electronic
devices to large-scale electrified transportation systems and grid-scale energy storage …

Uložiť Citovať Citované 58-krát Súvisiace články Všetky verzie 3

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

R Huang, J Huang, D Yang, Y Ren… - International …, 2023 - proceedings.mlr.press

Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …

Uložiť Citovať Citované 334-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition

X Ding, Y Zhang, Y Ge, S Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …

Uložiť Citovať Citované 144-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Beats: Audio pre-training with acoustic tokenizers

S Chen, Y Wu, C Wang, S Liu, D Tompkins… - arxiv preprint arxiv …, 2022 - arxiv.org

The massive growth of self-supervised learning (SSL) has been witnessed in language,
vision, speech, and audio domains over the past few years. While discrete label prediction is …

Uložiť Citovať Citované 290-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Masked autoencoders that listen

PY Huang, H Xu, J Li, A Baevski… - Advances in …, 2022 - proceedings.neurips.cc

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-
supervised representation learning from audio spectrograms. Following the Transformer …

Uložiť Citovať Citované 258-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Uložiť Citovať Citované 1871-krát Súvisiace články Všetky verzie 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Automatic speech recognition using advanced deep learning approaches: A survey

H Kheddar, M Hemis, Y Himeur - Information Fusion, 2024 - Elsevier

Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …

Uložiť Citovať Citované 55-krát Súvisiace články Všetky verzie 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mulan: A joint embedding of music audio and natural language

Q Huang, A Jansen, J Lee, R Ganti, JY Li… - arxiv preprint arxiv …, 2022 - arxiv.org

Music tagging and content-based retrieval systems have traditionally been constructed
using pre-defined ontologies covering a rigid set of music attributes or text queries. This …

Uložiť Citovať Citované 170-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Contrastive audio-visual masked autoencoder

Y Gong, A Rouditchenko, AH Liu, D Harwath… - arxiv preprint arxiv …, 2022 - arxiv.org

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single
modality to audio-visual multi-modalities. Subsequently, we propose the Contrastive Audio …

Uložiť Citovať Citované 145-krát Súvisiace články Všetky verzie 5 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Ssast: Self-supervised audio spectrogram transformer

Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

[HTML][HTML] Battery safety: Machine learning-based prognostics

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition

Beats: Audio pre-training with acoustic tokenizers

Masked autoencoders that listen

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

Automatic speech recognition using advanced deep learning approaches: A survey

Mulan: A joint embedding of music audio and natural language

Contrastive audio-visual masked autoencoder