- Academic Search

A comprehensive review of multimodal large language models: Performance and challenges across different tasks

J Wang, H Jiang, Y Liu, C Ma, X Zhang, Y Pan… - arxiv preprint arxiv …, 2024 - arxiv.org

In an era defined by the explosive growth of data and rapid technological advancements,
Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence …

Speichern Zitieren Zitiert von: 21 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Audiolm: a language modeling approach to audio generation

Z Borsos, R Marinier, D Vincent… - … ACM transactions on …, 2023 - ieeexplore.ieee.org

We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …

Speichern Zitieren Zitiert von: 580 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] aaai.org

Audiogpt: Understanding and generating speech, music, sound, and talking head

R Huang, M Li, D Yang, J Shi, X Chang, Z Ye… - Proceedings of the …, 2024 - ojs.aaai.org

Large language models (LLMs) have exhibited remarkable capabilities across a variety of
domains and tasks, challenging our understanding of learning and cognition. Despite the …

Speichern Zitieren Zitiert von: 175 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Make-a-voice: Unified voice synthesis with discrete representation

R Huang, C Zhang, Y Wang, D Yang, L Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

Various applications of voice synthesis have been developed independently despite the fact
that they generate" voice" as output in common. In addition, the majority of voice synthesis …

Speichern Zitieren Zitiert von: 30 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Speichern Zitieren Zitiert von: 6 Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Are discrete units necessary for spoken language modeling?

TA Nguyen, B Sagot, E Dupoux - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

Recent work in spoken language modeling shows the possibility of learning a language
unsupervisedly from raw audio without any text labels. The approach relies first on …

Speichern Zitieren Zitiert von: 26 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] arxiv.org

Speechprompt: Prompting speech language models for speech processing tasks

KW Chang, H Wu, YK Wang, YK Wu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Prompting has become a practical method for utilizing pre-trained language models (LMs).
This approach offers several advantages. It allows an LM to adapt to new tasks with minimal …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]

[PDF] ieee.org

Disentangling prosody representations with unsupervised speech reconstruction

L Qu, T Li, C Weber, T Pekarek-Rosin… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Human speech can be characterized by different components, including semantic content,
speaker identity and prosodic information. Significant progress has been made in …

Speichern Zitieren Zitiert von: 8 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]

[PDF] arxiv.org

Paralinguistic privacy protection at the edge

R Aloufi, H Haddadi, D Boyle - ACM Transactions on Privacy and …, 2023 - dl.acm.org

Voice user interfaces and digital assistants are rapidly entering our lives and becoming
singular touch points spanning our devices. These always-on services capture and transmit …

Speichern Zitieren Zitiert von: 14 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] arxiv.org

Evolutionary Retrofitting

M Videau, M Zameshina, A Leite, L Najman… - arxiv preprint arxiv …, 2024 - arxiv.org

AfterLearnER (After Learning Evolutionary Retrofitting) consists in applying non-
differentiable optimization, including evolutionary methods, to refine fully-trained machine …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 12 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

textless-lib: A library for textless spoken language processing

A comprehensive review of multimodal large language models: Performance and challenges across different tasks

Audiolm: a language modeling approach to audio generation

Audiogpt: Understanding and generating speech, music, sound, and talking head

Make-a-voice: Unified voice synthesis with discrete representation

Wavchat: A survey of spoken dialogue models

Are discrete units necessary for spoken language modeling?

Speechprompt: Prompting speech language models for speech processing tasks

Disentangling prosody representations with unsupervised speech reconstruction

Paralinguistic privacy protection at the edge

Evolutionary Retrofitting