Speechprompt v2: Prompt tuning for speech classification tasks

KW Chang, YK Wang, H Shen, I Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
Prompt tuning is a technology that tunes a small set of parameters to steer a pre-trained
language model (LM) to directly generate the output for downstream tasks. Recently, prompt …

Joint audio and speech understanding

Y Gong, AH Liu, H Luo, L Karlinsky… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Humans are surrounded by audio signals that include both speech and non-speech sounds.
The recognition and understanding of speech and non-speech audio events, along with a …

RETRACTED ARTICLE: Age and gender classification using Seg-Net based architecture and machine learning

S Kumar, S Singh, J Kumar, K Prasad - Multimedia Tools and Applications, 2022 - Springer
A facial recognition framework is a natural face-recognizing process from a computerized
image or videos. Nowadays, for real-time applications, ie, human–computer interaction …

Universlu: Universal spoken language understanding for diverse classification and sequence generation tasks with a single network

S Arora, H Futami, J Jung, Y Peng, R Sharma… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent studies have demonstrated promising outcomes by employing large language
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …

Cross-age speaker verification: Learning age-invariant speaker embeddings

X Qin, N Li, C Weng, D Su, M Li - arxiv preprint arxiv:2207.05929, 2022 - arxiv.org
Automatic speaker verification has achieved remarkable progress in recent years. However,
there is little research on cross-age speaker verification (CASV) due to insufficient relevant …

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

D Sztahó, A Fejes - Journal of forensic sciences, 2023 - Wiley Online Library
In forensic voice comparison, deep learning has become widely popular recently. It is mainly
used to learn speaker representations, called embeddings or embedding vectors. Speaker …

Voicepm: A robust privacy measurement on voice anonymity

S Zhang, Z Li, A Das - Proceedings of the 16th ACM Conference on …, 2023 - dl.acm.org
Voice-based human-computer interaction has become pervasive in laptops, smartphones,
home voice assistants, and Internet of Thing (IoT) devices. However, voice interaction comes …

Investigating Long-Term and Short-Term Time-Varying Speaker Verification

X Qin, N Li, S Duan, M Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
The performance of speaker verification systems can be adversely affected by time domain
variations. However, limited research has been conducted on time-varying speaker …

[PDF][PDF] Challenges of using longitudinal and cross-domain corpora on studies of pathological speech.

C Botelho, T Schultz, A Abad, I Trancoso - INTERSPEECH, 2022 - isca-archive.org
Several promising works have reported very exciting results in the field of speech in health,
however there are still issues to address before deploying such systems into clinical …

Speech-based Age and Gender Prediction with Transformers

F Burkhardt, J Wagner, H Wierstorf… - … 15th ITG Conference, 2023 - ieeexplore.ieee.org
We report on the curation of several publicly available datasets for age and gender
prediction. Furthermore, we present experiments to predict age and gender with models …