Recent advances in speech language models: A survey

W Cui, D Yu, X Jiao, Z Meng, G Zhang, Q Wang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large Language Models (LLMs) have recently garnered significant attention, primarily for
their capabilities in text-based interactions. However, natural human interaction often relies …

Conferencingspeech 2022 challenge: Non-intrusive objective speech quality assessment (NISQA) challenge for online conferencing applications

G Yi, W **ao, Y **ao, B Naderi, S Möller… - arxiv preprint arxiv …, 2022‏ - arxiv.org
With the advances in speech communication systems such as online conferencing
applications, we can seamlessly work with people regardless of where they are. However …

The VoiceMOS Challenge 2023: zero-shot subjective speech quality prediction for multiple domains

E Cooper, WC Huang, Y Tsao… - 2023 IEEE Automatic …, 2023‏ - ieeexplore.ieee.org
We present the second edition of the VoiceMOS Challenge, a scientific event that aims to
promote the study of automatic prediction of the mean opinion score (MOS) of synthesized …

Torchaudio-squim: Reference-less speech quality and intelligibility measures in torchaudio

A Kumar, K Tan, Z Ni, P Manocha… - ICASSP 2023-2023 …, 2023‏ - ieeexplore.ieee.org
Measuring quality and intelligibility of a speech signal is usually a critical step in
development of speech processing systems. To enable this, a variety of metrics to measure …

{SMACK}: Semantically Meaningful Adversarial Audio Attack

Z Yu, Y Chang, N Zhang, C **ao - 32nd USENIX Security Symposium …, 2023‏ - usenix.org
Voice controllable systems rely on speech recognition and speaker identification as the key
enabling technologies. While they bring revolutionary changes to our daily lives, their …

ICASSP 2024 speech signal improvement challenge

NC Ristea, B Naderi, A Saabas, R Cutler… - IEEE Open Journal …, 2025‏ - ieeexplore.ieee.org
The ICASSP 2024 Speech Signal Improvement Challenge aims to advance research in
enhancing speech signal quality within communication systems. The speech signal quality …

Antifake: Using adversarial audio to prevent unauthorized speech synthesis

Z Yu, S Zhai, N Zhang - Proceedings of the 2023 ACM SIGSAC …, 2023‏ - dl.acm.org
The rapid development of deep neural networks and generative AI has catalyzed growth in
realistic speech synthesis. While this technology has great potential to improve lives, it also …

When evil calls: Targeted adversarial voice over ip network

H Liu, Z Yu, M Zha, XF Wang, W Yeoh… - Proceedings of the …, 2022‏ - dl.acm.org
As the COVID-19 pandemic fundamentally reshaped the remote life and working styles,
Voice over IP (VoIP) telephony and video conferencing have become a primary method of …

Speech quality assessment through MOS using non-matching references

P Manocha, A Kumar - arxiv preprint arxiv:2206.12285, 2022‏ - arxiv.org
Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way
to assess the quality of speech signals. However, several recent attempts to automatically …

Can I hear your face? pervasive attack on voice authentication systems with a single face image

N Jiang, B Sun, T Sim, J Han - 33rd USENIX Security Symposium …, 2024‏ - usenix.org
We present Foice, a novel deepfake attack against voice authentication systems. Foice
generates a synthetic voice of the victim from just a single image of the victim's face, without …