Embedding large language models into extended reality: Opportunities and challenges for inclusion, engagement, and privacy

E Bozkir, S Özdel, KHC Lau, M Wang, H Gao… - Proceedings of the 6th …, 2024 - dl.acm.org
Advances in artificial intelligence and human-computer interaction will likely lead to
extended reality (XR) becoming pervasive. While XR can provide users with interactive …

Robot learning in the era of foundation models: A survey

X **ao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

SpiRit-LM: Interleaved Spoken and Written Language Model

TA Nguyen, B Muller, B Yu, MR Costa-Jussa… - Transactions of the …, 2025 - direct.mit.edu
We introduce SpiRit-lm, a foundation multimodal language model that freely mixes text and
speech. Our model is based on a 7B pretrained text language model that we extend to the …

A survey on detection of llms-generated content

X Yang, L Pan, X Zhao, H Chen, L Petzold… - arxiv preprint arxiv …, 2023 - arxiv.org
The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT
have led to an increase in synthetic content generation with implications across a variety of …

Make-a-voice: Revisiting voice large language models as scalable multilingual and multitask learners

R Huang, C Zhang, Y Wang, D Yang… - Proceedings of the …, 2024 - aclanthology.org
Large language models (LLMs) have successfully served as a general-purpose interface
across multiple tasks and languages, while the adaptation of voice LLMs is mostly designed …

Can Whisper Perform Speech-Based In-Context Learning?

S Wang, CH Yang, J Wu… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
This paper investigates the in-context learning abilities of the Whisper automatic speech
recognition (ASR) models released by OpenAI. A novel speech-based in-context learning …

Comparison of multilingual self-supervised and weakly-supervised speech pre-training for adaptation to unseen languages

A Rouditchenko, S Khurana, S Thomas, R Feris… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent models such as XLS-R and Whisper have made multilingual speech technologies
more accessible by pre-training on audio from around 100 spoken languages each …

Joint prediction and denoising for large-scale multilingual self-supervised learning

W Chen, J Shi, B Yan, D Berrebbi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA)
methods due to the expenses and complexity required to handle many languages. This …

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

J Hwang, M Hira, C Chen, X Zhang, Z Ni… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims
to accelerate the research and development of audio and speech technologies by providing …

Cl-masr: A continual learning benchmark for multilingual asr

L Della Libera, P Mousavi, S Zaiem… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Modern multilingual automatic speech recognition (ASR) systems like Whisper have made it
possible to transcribe audio in multiple languages with a single model. However, current …