Google Acadèmic

[HTML][HTML] Speech emotion recognition using transfer learning: Integration of advanced speaker embeddings and image recognition models

M Jakubec, E Lieskovska, R Jarina, M Spisiak… - Applied Sciences, 2024 - mdpi.com

Automatic Speech Emotion Recognition (SER) plays a vital role in making human–computer
interactions more natural and effective. A significant challenge in SER development is the …

Desa Cita Citat per 2 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek A la memòria cau

[Free GPT-4]
[DeepSeek]

[PDF] ssrn.com

Enhancing text generation from knowledge graphs with cross-structure attention distillation

X Shi, Z **a, P Cheng, Y Li - Engineering Applications of Artificial …, 2024 - Elsevier

Existing Large-scale pre-trained language models (PLMs) can effectively enhance the
knowledge-graph-to-text (KG-to-text) generation by processing the linearized version of a …

Desa Cita Citat per 2 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Representation Purification for End-to-End Speech Translation

C Zhang, Y Zhou, R Zhao, Y Chen, X Shi - arxiv preprint arxiv:2412.04266, 2024 - arxiv.org

Speech-to-text translation (ST) is a cross-modal task that involves converting spoken
language into text in a different language. Previous research primarily focused on …

Desa Cita Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

Enhancing multimodal translation: Achieving consistency among visual information, source language and target language

X Shi, X Yang, P Cheng, Y Zhou, J Liu - Neurocomputing, 2025 - Elsevier

Multimodal machine translation refers to the task of using information from images, videos,
etc., to assist in text translation. Numerous studies have demonstrated that incorporating …

Desa Cita Articles relacionats

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2

C Xu, EW Sun - arxiv preprint arxiv:2407.14212, 2024 - arxiv.org

An increasing number of Chinese people are troubled by different degrees of visual
impairment, which has made the modal conversion between a single image or video frame …

Desa Cita Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

CCSRD: Content-centric speech representation disentanglement learning for end-to-end speech...

[HTML][HTML] Speech emotion recognition using transfer learning: Integration of advanced speaker embeddings and image recognition models

Enhancing text generation from knowledge graphs with cross-structure attention distillation

Representation Purification for End-to-End Speech Translation

Enhancing multimodal translation: Achieving consistency among visual information, source language and target language

Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2