Privacy-preserving voice analysis via disentangled representations
Voice User Interfaces (VUIs) are increasingly popular and built into smartphones, home
assistants, and Internet of Things (IoT) devices. Despite offering an always-on convenient …
assistants, and Internet of Things (IoT) devices. Despite offering an always-on convenient …
A study of bias mitigation strategies for speaker recognition
Speaker recognition is increasingly used in several everyday applications including smart
speakers, customer care centers and other speech-driven analytics. It is crucial to accurately …
speakers, customer care centers and other speech-driven analytics. It is crucial to accurately …
Contrastive self-supervised speaker embedding with sequential disentanglement
Contrastive self-supervised learning has been widely used in speaker embedding to
address the labeling challenge. Contrastive speaker embedding assumes that the contrast …
address the labeling challenge. Contrastive speaker embedding assumes that the contrast …
Learning disentangled phone and speaker representations in a semi-supervised VQ-VAE paradigm
We present a new approach to disentangle speaker voice and phone content by introducing
new components to the VQ-VAE architecture for speech synthesis. The original VQ-VAE …
new components to the VQ-VAE architecture for speech synthesis. The original VQ-VAE …
Contrastive speaker embedding with sequential disentanglement
Contrastive speaker embedding assumes that the contrast between the positive and
negative pairs of speech segments is attributed to speaker identity only. However, this …
negative pairs of speech segments is attributed to speaker identity only. However, this …
Random cycle loss and its application to voice conversion
Speech disentanglement aims to decompose independent causal factors of speech signals
into separate codes. Perfect disentanglement benefits to a broad range of speech …
into separate codes. Perfect disentanglement benefits to a broad range of speech …
Paralinguistic privacy protection at the edge
Voice user interfaces and digital assistants are rapidly entering our lives and becoming
singular touch points spanning our devices. These always-on services capture and transmit …
singular touch points spanning our devices. These always-on services capture and transmit …
Acted vs. improvised: Domain adaptation for elicitation approaches in audio-visual emotion recognition
H Li, Y Kim, CH Kuo, S Narayanan - ar** generalized automatic emotion recognition systems include
scarcity of labeled data and lack of gold-standard references. Even for the cues that are …
scarcity of labeled data and lack of gold-standard references. Even for the cues that are …
Exploring disentanglement with multilingual and monolingual VQ-VAE
This work examines the content and usefulness of disentangled phone and speaker
representations from two separately trained VQ-VAE systems: one trained on multilingual …
representations from two separately trained VQ-VAE systems: one trained on multilingual …
Large-Scale Functional Connectome Fingerprinting for Generalization and Transfer Learning in Neuroimaging
Functional MRI currently supports a limited application space stemming from modest dataset
sizes, large interindividual variability and heterogeneity among scanning protocols. These …
sizes, large interindividual variability and heterogeneity among scanning protocols. These …