- Academic Search

Embedding large language models into extended reality: Opportunities and challenges for inclusion, engagement, and privacy

E Bozkir, S Özdel, KHC Lau, M Wang, H Gao… - Proceedings of the 6th …, 2024 - dl.acm.org

Advances in artificial intelligence and human-computer interaction will likely lead to
extended reality (XR) becoming pervasive. While XR can provide users with interactive …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

Robot learning in the era of foundation models: A survey

X **ao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org

The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

Speichern Zitieren Zitiert von: 22 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] mit.edu

SpiRit-LM: Interleaved Spoken and Written Language Model

TA Nguyen, B Muller, B Yu, MR Costa-Jussa… - Transactions of the …, 2025 - direct.mit.edu

We introduce SpiRit-lm, a foundation multimodal language model that freely mixes text and
speech. Our model is based on a 7B pretrained text language model that we extend to the …

Speichern Zitieren Zitiert von: 27 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] arxiv.org

A survey on detection of llms-generated content

X Yang, L Pan, X Zhao, H Chen, L Petzold… - arxiv preprint arxiv …, 2023 - arxiv.org

The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT
have led to an increase in synthetic content generation with implications across a variety of …

Speichern Zitieren Zitiert von: 42 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] aclanthology.org

Make-a-voice: Revisiting voice large language models as scalable multilingual and multitask learners

R Huang, C Zhang, Y Wang, D Yang… - Proceedings of the …, 2024 - aclanthology.org

Large language models (LLMs) have successfully served as a general-purpose interface
across multiple tasks and languages, while the adaptation of voice LLMs is mostly designed …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Can Whisper Perform Speech-Based In-Context Learning?

S Wang, CH Yang, J Wu… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

This paper investigates the in-context learning abilities of the Whisper automatic speech
recognition (ASR) models released by OpenAI. A novel speech-based in-context learning …

Speichern Zitieren Zitiert von: 22 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

Comparison of multilingual self-supervised and weakly-supervised speech pre-training for adaptation to unseen languages

A Rouditchenko, S Khurana, S Thomas, R Feris… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent models such as XLS-R and Whisper have made multilingual speech technologies
more accessible by pre-training on audio from around 100 spoken languages each …

Speichern Zitieren Zitiert von: 19 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Joint prediction and denoising for large-scale multilingual self-supervised learning

W Chen, J Shi, B Yan, D Berrebbi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA)
methods due to the expenses and complexity required to handle many languages. This …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] arxiv.org

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

J Hwang, M Hira, C Chen, X Zhang, Z Ni… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims
to accelerate the research and development of audio and speech technologies by providing …

Speichern Zitieren Zitiert von: 16 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] arxiv.org

Cl-masr: A continual learning benchmark for multilingual asr

L Della Libera, P Mousavi, S Zaiem… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Modern multilingual automatic speech recognition (ASR) systems like Whisper have made it
possible to transcribe audio in multiple languages with a single model. However, current …

Speichern Zitieren Zitiert von: 9 Ähnliche Artikel Alle 5 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Scaling speech technology to 1,000+ languages

Embedding large language models into extended reality: Opportunities and challenges for inclusion, engagement, and privacy

Robot learning in the era of foundation models: A survey

SpiRit-LM: Interleaved Spoken and Written Language Model

A survey on detection of llms-generated content

Make-a-voice: Revisiting voice large language models as scalable multilingual and multitask learners

Can Whisper Perform Speech-Based In-Context Learning?

Comparison of multilingual self-supervised and weakly-supervised speech pre-training for adaptation to unseen languages

Joint prediction and denoising for large-scale multilingual self-supervised learning

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Cl-masr: A continual learning benchmark for multilingual asr