Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects
S Zhang, Y Yang, C Chen, X Zhang, Q Leng… - Expert Systems with …, 2024 - Elsevier
Emotion recognition has recently attracted extensive interest due to its significant
applications to human–computer interaction. The expression of human emotion depends on …
applications to human–computer interaction. The expression of human emotion depends on …
[HTML][HTML] Battery safety: Machine learning-based prognostics
Lithium-ion batteries play a pivotal role in a wide range of applications, from electronic
devices to large-scale electrified transportation systems and grid-scale energy storage …
devices to large-scale electrified transportation systems and grid-scale energy storage …
Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models
Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …
text-to-video generation. Its application to audio still lags behind for two main reasons: the …
Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition
Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …
research attention but two unresolved and critical issues demand further investigation. 1) …
Beats: Audio pre-training with acoustic tokenizers
The massive growth of self-supervised learning (SSL) has been witnessed in language,
vision, speech, and audio domains over the past few years. While discrete label prediction is …
vision, speech, and audio domains over the past few years. While discrete label prediction is …
Masked autoencoders that listen
This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-
supervised representation learning from audio spectrograms. Following the Transformer …
supervised representation learning from audio spectrograms. Following the Transformer …
Wavlm: Large-scale self-supervised pre-training for full stack speech processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …
exploration has been attempted for other speech processing tasks. As speech signal …
Automatic speech recognition using advanced deep learning approaches: A survey
Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
Mulan: A joint embedding of music audio and natural language
Music tagging and content-based retrieval systems have traditionally been constructed
using pre-defined ontologies covering a rigid set of music attributes or text queries. This …
using pre-defined ontologies covering a rigid set of music attributes or text queries. This …
Contrastive audio-visual masked autoencoder
In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single
modality to audio-visual multi-modalities. Subsequently, we propose the Contrastive Audio …
modality to audio-visual multi-modalities. Subsequently, we propose the Contrastive Audio …