Μελετητής Google

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 268 Σχετικά άρθρα Όλες οι 11 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive review of multimodal large language models: Performance and challenges across different tasks

J Wang, H Jiang, Y Liu, C Ma, X Zhang, Y Pan… - arxiv preprint arxiv …, 2024 - arxiv.org

In an era defined by the explosive growth of data and rapid technological advancements,
Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 24 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation

JZ Wu, Y Ge, X Wang, SW Lei, Y Gu… - Proceedings of the …, 2023 - openaccess.thecvf.com

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale
video datasets to train a text-to-video (T2V) generator. Despite their promising results, such …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 731 Σχετικά άρθρα Όλες οι 4 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Make-a-video: Text-to-video generation without text-video data

U Singer, A Polyak, T Hayes, X Yin, J An… - arxiv preprint arxiv …, 2022 - arxiv.org

We propose Make-A-Video--an approach for directly translating the tremendous recent
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1234 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Muse: Text-to-image generation via masked generative transformers

H Chang, H Zhang, J Barber, AJ Maschinot… - arxiv preprint arxiv …, 2023 - arxiv.org

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image
generation performance while being significantly more efficient than diffusion or …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 496 Σχετικά άρθρα Όλες οι 6 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, FF Li… - … on Computer Vision, 2024 - Springer

We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 131 Σχετικά άρθρα Όλες οι 3 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 149 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] 3dvar.com

[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation

J Yu, Y Xu, JY Koh, T Luong, G Baid, Z Wang… - arxiv preprint arxiv …, 2022 - 3dvar.com

Abstract We present the Pathways [1] Autoregressive Text-to-Image (Parti) model, which
generates high-fidelity photorealistic images and supports content-rich synthesis involving …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1105 Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Grounding language models to images for multimodal inputs and outputs

JY Koh, R Salakhutdinov… - … Conference on Machine …, 2023 - proceedings.mlr.press

We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 194 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiolm: a language modeling approach to audio generation

Z Borsos, R Marinier, D Vincent… - … ACM transactions on …, 2023 - ieeexplore.ieee.org

We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 602 Σχετικά άρθρα Όλες οι 5 εκδοχές

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Vector-quantized image modeling with improved vqgan

Multimodal image synthesis and editing: A survey and taxonomy

A comprehensive review of multimodal large language models: Performance and challenges across different tasks

Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation

Make-a-video: Text-to-video generation without text-video data

Muse: Text-to-image generation via masked generative transformers

Photorealistic video generation with diffusion models

Sequential modeling enables scalable learning for large vision models

[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation

Grounding language models to images for multimodal inputs and outputs

Audiolm: a language modeling approach to audio generation