End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

Sequential contrastive audio-visual learning

I Tsiamas, S Pascual, C Yeh, J Serrà - arxiv preprint arxiv:2407.05782, 2024 - arxiv.org
Contrastive learning has emerged as a powerful technique in audio-visual representation
learning, leveraging the natural co-occurrence of audio and visual modalities in extensive …

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

TK Lam, M Gaido, S Papi, L Bentivogli… - arxiv preprint arxiv …, 2025 - arxiv.org
Following the remarkable success of Large Language Models (LLMs) in NLP tasks, there is
increasing interest in extending their capabilities to speech--the most common form in …

Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?

I Tsiamas, M Sperber, A Finch, S Garg - arxiv preprint arxiv:2410.24019, 2024 - arxiv.org
The prosody of a spoken utterance, including features like stress, intonation and rhythm, can
significantly affect the underlying semantics, and as a consequence can also affect its textual …

How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations

H Lee, D Liu, S Sinhamahapatra, J Niehues - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal foundation models aim to create a unified representation space that abstracts
away from surface features like language syntax or modality differences. To investigate this …

Contrastive Learning for Task-Independent SpeechLLM-Pretraining

M Züfle, J Niehues - arxiv preprint arxiv:2412.15712, 2024 - arxiv.org
Large language models (LLMs) excel in natural language processing but adapting these
LLMs to speech processing tasks efficiently is not straightforward. Direct task-specific fine …