Unleashing the potential of conversational AI: Amplifying Chat-GPT's capabilities and tackling technical hurdles
Conversational AI has seen a growing interest among government, researchers, and
industrialists. This comprehensive survey paper provides an in-depth analysis of large …
industrialists. This comprehensive survey paper provides an in-depth analysis of large …
Cross-speaker style transfer for text-to-speech using data augmentation
We address the problem of cross-speaker style transfer for text-to-speech (TTS) using data
augmentation via voice conversion. We assume to have a corpus of neutral non-expressive …
augmentation via voice conversion. We assume to have a corpus of neutral non-expressive …
MSDLF-K: A Multimodal Feature Learning Approach for Sentiment Analysis in Korean Incorporating Text and Speech
TY Kim, J Yang, E Park - IEEE Transactions on Multimedia, 2024 - ieeexplore.ieee.org
Recently, sentiment analysis research has made significant improvements in addressing
sentiment and subjectivity within textual content. The advent of multimodal deep learning …
sentiment and subjectivity within textual content. The advent of multimodal deep learning …
Improving prosody for cross-speaker style transfer by semi-supervised style extractor and hierarchical modeling in speech synthesis
C Qiang, P Yang, H Che, Y Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Cross-speaker style transfer in speech synthesis aims at transferring a style from source
speaker to synthesized speech of a target speaker's timbre. In most previous methods, the …
speaker to synthesized speech of a target speaker's timbre. In most previous methods, the …
Style-label-free: Cross-speaker style transfer by quantized vae and speaker-wise normalization in speech synthesis
C Qiang, P Yang, H Che, X Wang… - 2022 13th International …, 2022 - ieeexplore.ieee.org
Cross-speaker style transfer in speech synthesis aims at transferring a style from source
speaker to synthesised speech of a target speaker's timbre. Most previous approaches rely …
speaker to synthesised speech of a target speaker's timbre. Most previous approaches rely …
[HTML][HTML] A framework for intelligent building information spoken dialogue system (iBISDS)
Existing Building Information Modeling (BIM) information extraction (IE) methods require
users to spend more time learning different query languages and database structures, which …
users to spend more time learning different query languages and database structures, which …
Emotional Text-To-Speech in Japanese Using Artificially Augmented Dataset
MJ Khalifah, M Ptaszynski, F Masui - IEEE Access, 2024 - ieeexplore.ieee.org
This study explores the feasibility of using artificial emotional speech datasets generated by
existing artificial voice-generating software as an alternative to human-generated datasets …
existing artificial voice-generating software as an alternative to human-generated datasets …
Smart Glasses: A Visual Assistant for Visually Impaired
The blind people cannot read the text; they will suffer a lot in their day-to-day lives to handle
this. Many techniques were introduced, but they didn't provide better accuracy. The main aim …
this. Many techniques were introduced, but they didn't provide better accuracy. The main aim …
[PDF][PDF] FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS.
In this paper, we propose a method to flexibly control the local prosodic variation of a neural
text-to-speech (TTS) model. To provide expressiveness for synthesized speech …
text-to-speech (TTS) model. To provide expressiveness for synthesized speech …
Synchronized Speech and Video Synthesis
AS Barve, P Madhani, Y Ghule… - … on Smart Computing …, 2023 - ieeexplore.ieee.org
The paper proposes a method to generate expressive talking head videos artificially in any
context by synthesizing and syncing speech and video, with the only provided inputs of a …
context by synthesizing and syncing speech and video, with the only provided inputs of a …