Unleashing the potential of conversational AI: Amplifying Chat-GPT's capabilities and tackling technical hurdles

V Hassija, A Chakrabarti, A Singh, V Chamola… - IEEE …, 2023 - ieeexplore.ieee.org
Conversational AI has seen a growing interest among government, researchers, and
industrialists. This comprehensive survey paper provides an in-depth analysis of large …

Cross-speaker style transfer for text-to-speech using data augmentation

MS Ribeiro, J Roth, G Comini… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
We address the problem of cross-speaker style transfer for text-to-speech (TTS) using data
augmentation via voice conversion. We assume to have a corpus of neutral non-expressive …

MSDLF-K: A Multimodal Feature Learning Approach for Sentiment Analysis in Korean Incorporating Text and Speech

TY Kim, J Yang, E Park - IEEE Transactions on Multimedia, 2024 - ieeexplore.ieee.org
Recently, sentiment analysis research has made significant improvements in addressing
sentiment and subjectivity within textual content. The advent of multimodal deep learning …

Improving prosody for cross-speaker style transfer by semi-supervised style extractor and hierarchical modeling in speech synthesis

C Qiang, P Yang, H Che, Y Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Cross-speaker style transfer in speech synthesis aims at transferring a style from source
speaker to synthesized speech of a target speaker's timbre. In most previous methods, the …

Style-label-free: Cross-speaker style transfer by quantized vae and speaker-wise normalization in speech synthesis

C Qiang, P Yang, H Che, X Wang… - 2022 13th International …, 2022 - ieeexplore.ieee.org
Cross-speaker style transfer in speech synthesis aims at transferring a style from source
speaker to synthesised speech of a target speaker's timbre. Most previous approaches rely …

[HTML][HTML] A framework for intelligent building information spoken dialogue system (iBISDS)

N Wang, RR Issa, CJ Anumba - EG-ICE 2021 workshop on …, 2021 - books.google.com
Existing Building Information Modeling (BIM) information extraction (IE) methods require
users to spend more time learning different query languages and database structures, which …

Emotional Text-To-Speech in Japanese Using Artificially Augmented Dataset

MJ Khalifah, M Ptaszynski, F Masui - IEEE Access, 2024 - ieeexplore.ieee.org
This study explores the feasibility of using artificial emotional speech datasets generated by
existing artificial voice-generating software as an alternative to human-generated datasets …

Smart Glasses: A Visual Assistant for Visually Impaired

M Prabha, P Saraswathi, J Hailly… - … on Emerging Trends …, 2023 - ieeexplore.ieee.org
The blind people cannot read the text; they will suffer a lot in their day-to-day lives to handle
this. Many techniques were introduced, but they didn't provide better accuracy. The main aim …

[PDF][PDF] FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS.

C Kim, S Um, H Yoon, HG Kang - INTERSPEECH, 2022 - isca-archive.org
In this paper, we propose a method to flexibly control the local prosodic variation of a neural
text-to-speech (TTS) model. To provide expressiveness for synthesized speech …

Synchronized Speech and Video Synthesis

AS Barve, P Madhani, Y Ghule… - … on Smart Computing …, 2023 - ieeexplore.ieee.org
The paper proposes a method to generate expressive talking head videos artificially in any
context by synthesizing and syncing speech and video, with the only provided inputs of a …