Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from
reference speech using self-supervised learning (SSL) speech representations, can …
reference speech using self-supervised learning (SSL) speech representations, can …
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
This paper proposes a zero-shot text-to-speech (TTS) conditioned by a self-supervised
speech-representation model acquired through self-supervised learning (SSL) …
speech-representation model acquired through self-supervised learning (SSL) …
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
In this paper, we present a novel method for phoneme-level prosody control of F0 and
duration using intuitive discrete labels. We propose an unsupervised prosodic clustering …
duration using intuitive discrete labels. We propose an unsupervised prosodic clustering …
Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning
Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely,
Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to …
Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to …
Speech rhythm-based speaker embeddings extraction from phonemes and phoneme duration for multi-speaker speech synthesis
This paper proposes a speech rhythm-based method for speaker embeddings to model
phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the …
phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the …
Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
In this paper, we investigate the impact of speech temporal dynamics in application to
automatic speaker verification and speaker voice anonymization tasks. We propose several …
automatic speaker verification and speaker voice anonymization tasks. We propose several …
Incorporating Speaker's Speech Rate Features for Improved Voice Cloning
Q Zhe, I Katunobu - 2023 9th International Conference on …, 2023 - ieeexplore.ieee.org
We investigate a neural network-based text-to-speech (TTS) synthesis system that aims to
simulate the Mandarin voice of different speakers using short voice samples. Our system …
simulate the Mandarin voice of different speakers using short voice samples. Our system …
[HTML][HTML] Creating" Shido Twin" by Using Another Me Technology NTT Digital Twin Computing Research Center NTT Human Informatics Laboratories
A Fukayama, R Ishii, A Morikawa, H Noto, S Eitoku… - rd.ntt
“Cho Kabuki 2022 Powered by NTT,” a kabuki play sponsored by Shochiku Co., Ltd., is the
first social implementation of Another Me, a technology for creating a human digital twin that …
first social implementation of Another Me, a technology for creating a human digital twin that …
韻律特徴を考慮した音声仮名化
伊藤葵, 伊藤克亘 - 第 86 回全国大会講演論文集, 2024 - ipsj.ixsq.nii.ac.jp
論文抄録 音声仮名化によって話者のプライバシーを保護することで, 文字起こしからは読み取れ
ない音声データそのものに含まれる情報 (発話者の意図など) を有効活用できる. 本稿では …
ない音声データそのものに含まれる情報 (発話者の意図など) を有効活用できる. 本稿では …
話速モデル化に基づく自然なボイスクローニングの実現
秦哲, 伊藤克亘 - 第 85 回全国大会講演論文集, 2023 - ipsj.ixsq.nii.ac.jp
論文抄録 ボイスクローニングというのは, 話者の特徴を抽出することで, 話者の声で話す TTS
を生成する技術である. 先行研究でのボイスクローニングでは, 入力する音声を増やすことでより自然 …
を生成する技術である. 先行研究でのボイスクローニングでは, 入力する音声を増やすことでより自然 …