Академия Google

J Chen, J Yu, C Ge, L Yao, E **e, Y Wu, Z Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

The most advanced text-to-image (T2I) models require significant training costs (eg, millions
of GPU hours), seriously hindering the fundamental innovation for the AIGC community …

Сохранить Цитировать Цитируется: 387 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating text-to-visual generation with image-to-text generation

Z Lin, D Pathak, B Li, J Li, X **a, G Neubig… - … on Computer Vision, 2024 - Springer

Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

Сохранить Цитировать Цитируется: 62 Похожие статьи Все версии статьи (2)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Сохранить Цитировать Цитируется: 219 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Emu3: Next-token prediction is all you need

X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

While next-token prediction is considered a promising path towards artificial general
intelligence, it has struggled to excel in multimodal tasks, which are still dominated by …

Сохранить Цитировать Цитируется: 72 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

L Yang, Z Yu, C Meng, M Xu, S Ermon… - Forty-first International …, 2024 - openreview.net

Diffusion models have exhibit exceptional performance in text-to-image generation and
editing. However, existing methods often face challenges when handling complex text …

Сохранить Цитировать Цитируется: 86 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Revision: Rendering tools enable spatial fidelity in vision-language models

A Chatterjee, Y Luo, T Gokhale, Y Yang… - European Conference on …, 2024 - Springer

Abstract Text-to-Image (T2I) and multimodal large language models (MLLMs) have been
adopted in solutions for several computer vision and multimodal learning tasks. However, it …

Сохранить Цитировать Цитируется: 3 Похожие статьи Все версии статьи (9)

[Free GPT-4]
[DeepSeek]

[PDF] frontiersin.org

Deepfake: definitions, performance metrics and standards, datasets, and a meta-review

E Altuncu, VNL Franqueira, S Li - Frontiers in Big Data, 2024 - frontiersin.org

Recent advancements in AI, especially deep learning, have contributed to a significant
increase in the creation of new realistic-looking synthetic media (video, image, and audio) …

Сохранить Цитировать Цитируется: 4 Похожие статьи Все версии статьи (7) Сохраненная копия

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unified hallucination detection for multimodal large language models

X Chen, C Wang, Y Xue, N Zhang, X Yang, Q Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …

Сохранить Цитировать Цитируется: 42 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ranni: Taming text-to-image diffusion for accurate instruction following

Y Feng, B Gong, D Chen, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts
especially those with quantity object-attribute binding and multi-subject descriptions. In this …

Сохранить Цитировать Цитируется: 31 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

K Sun, K Huang, X Liu, Y Wu, Z Xu, Z Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …

Сохранить Цитировать Цитируется: 18 Похожие статьи Все версии статьи (3) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation

PixArt-: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Evaluating text-to-visual generation with image-to-text generation

Vbench: Comprehensive benchmark suite for video generative models

Emu3: Next-token prediction is all you need

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

Revision: Rendering tools enable spatial fidelity in vision-language models

Deepfake: definitions, performance metrics and standards, datasets, and a meta-review

Unified hallucination detection for multimodal large language models

Ranni: Taming text-to-image diffusion for accurate instruction following

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation