Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
A comprehensive review on generative ai for education
Artificial Intelligence (AI) has immense potential for personalized learning experiences,
content generation, and vivid educational support. This paper delves into generative AI (GAI) …
content generation, and vivid educational support. This paper delves into generative AI (GAI) …
Simple and controllable music generation
We tackle the task of conditional music generation. We introduce MusicGen, a single
Language Model (LM) that operates over several streams of compressed discrete music …
Language Model (LM) that operates over several streams of compressed discrete music …
Shap-e: Generating conditional 3d implicit functions
H Jun, A Nichol - arxiv preprint arxiv:2305.02463, 2023 - arxiv.org
We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D
generative models which produce a single output representation, Shap-E directly generates …
generative models which produce a single output representation, Shap-E directly generates …
Towards generalist biomedical AI
Background Medicine is inherently multimodal, requiring the simultaneous interpretation
and integration of insights between many data modalities spanning text, imaging, genomics …
and integration of insights between many data modalities spanning text, imaging, genomics …
Photorealistic video generation with diffusion models
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
High-fidelity audio compression with improved rvqgan
Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …
images, speech, and music. A key component of these models is a high quality neural …
On the robustness of chatgpt: An adversarial and out-of-distribution perspective
ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing
attention over the past few months. While evaluations of various aspects of ChatGPT have …
attention over the past few months. While evaluations of various aspects of ChatGPT have …
Generative ai
Tom Freston is credited with saying ''Innovation is taking two things that exist and putting
them together in a new way''. For a long time in history, it has been the prevailing …
them together in a new way''. For a long time in history, it has been the prevailing …