Embedding large language models into extended reality: Opportunities and challenges for inclusion, engagement, and privacy
Advances in artificial intelligence and human-computer interaction will likely lead to
extended reality (XR) becoming pervasive. While XR can provide users with interactive …
extended reality (XR) becoming pervasive. While XR can provide users with interactive …
Robot learning in the era of foundation models: A survey
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …
SpiRit-LM: Interleaved Spoken and Written Language Model
We introduce SpiRit-lm, a foundation multimodal language model that freely mixes text and
speech. Our model is based on a 7B pretrained text language model that we extend to the …
speech. Our model is based on a 7B pretrained text language model that we extend to the …
A survey on detection of llms-generated content
The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT
have led to an increase in synthetic content generation with implications across a variety of …
have led to an increase in synthetic content generation with implications across a variety of …
Make-a-voice: Revisiting voice large language models as scalable multilingual and multitask learners
Large language models (LLMs) have successfully served as a general-purpose interface
across multiple tasks and languages, while the adaptation of voice LLMs is mostly designed …
across multiple tasks and languages, while the adaptation of voice LLMs is mostly designed …
Can Whisper Perform Speech-Based In-Context Learning?
This paper investigates the in-context learning abilities of the Whisper automatic speech
recognition (ASR) models released by OpenAI. A novel speech-based in-context learning …
recognition (ASR) models released by OpenAI. A novel speech-based in-context learning …
Comparison of multilingual self-supervised and weakly-supervised speech pre-training for adaptation to unseen languages
Recent models such as XLS-R and Whisper have made multilingual speech technologies
more accessible by pre-training on audio from around 100 spoken languages each …
more accessible by pre-training on audio from around 100 spoken languages each …
Joint prediction and denoising for large-scale multilingual self-supervised learning
Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA)
methods due to the expenses and complexity required to handle many languages. This …
methods due to the expenses and complexity required to handle many languages. This …
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims
to accelerate the research and development of audio and speech technologies by providing …
to accelerate the research and development of audio and speech technologies by providing …
Cl-masr: A continual learning benchmark for multilingual asr
Modern multilingual automatic speech recognition (ASR) systems like Whisper have made it
possible to transcribe audio in multiple languages with a single model. However, current …
possible to transcribe audio in multiple languages with a single model. However, current …