Speechprompt v2: Prompt tuning for speech classification tasks
Prompt tuning is a technology that tunes a small set of parameters to steer a pre-trained
language model (LM) to directly generate the output for downstream tasks. Recently, prompt …
language model (LM) to directly generate the output for downstream tasks. Recently, prompt …
Joint audio and speech understanding
Humans are surrounded by audio signals that include both speech and non-speech sounds.
The recognition and understanding of speech and non-speech audio events, along with a …
The recognition and understanding of speech and non-speech audio events, along with a …
RETRACTED ARTICLE: Age and gender classification using Seg-Net based architecture and machine learning
A facial recognition framework is a natural face-recognizing process from a computerized
image or videos. Nowadays, for real-time applications, ie, human–computer interaction …
image or videos. Nowadays, for real-time applications, ie, human–computer interaction …
Universlu: Universal spoken language understanding for diverse classification and sequence generation tasks with a single network
Recent studies have demonstrated promising outcomes by employing large language
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …
Cross-age speaker verification: Learning age-invariant speaker embeddings
Automatic speaker verification has achieved remarkable progress in recent years. However,
there is little research on cross-age speaker verification (CASV) due to insufficient relevant …
there is little research on cross-age speaker verification (CASV) due to insufficient relevant …
Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings
In forensic voice comparison, deep learning has become widely popular recently. It is mainly
used to learn speaker representations, called embeddings or embedding vectors. Speaker …
used to learn speaker representations, called embeddings or embedding vectors. Speaker …
Voicepm: A robust privacy measurement on voice anonymity
Voice-based human-computer interaction has become pervasive in laptops, smartphones,
home voice assistants, and Internet of Thing (IoT) devices. However, voice interaction comes …
home voice assistants, and Internet of Thing (IoT) devices. However, voice interaction comes …
Investigating Long-Term and Short-Term Time-Varying Speaker Verification
The performance of speaker verification systems can be adversely affected by time domain
variations. However, limited research has been conducted on time-varying speaker …
variations. However, limited research has been conducted on time-varying speaker …
[PDF][PDF] Challenges of using longitudinal and cross-domain corpora on studies of pathological speech.
Several promising works have reported very exciting results in the field of speech in health,
however there are still issues to address before deploying such systems into clinical …
however there are still issues to address before deploying such systems into clinical …
Speech-based Age and Gender Prediction with Transformers
We report on the curation of several publicly available datasets for age and gender
prediction. Furthermore, we present experiments to predict age and gender with models …
prediction. Furthermore, we present experiments to predict age and gender with models …