Google 학술 검색

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org

Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

저장 인용 1550회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]

[PDF] openreview.net

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

L Yang, Z Yu, C Meng, M Xu, S Ermon… - Forty-first International …, 2024 - openreview.net

Diffusion models have exhibit exceptional performance in text-to-image generation and
editing. However, existing methods often face challenges when handling complex text …

저장 인용 85회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Revision: Rendering tools enable spatial fidelity in vision-language models

A Chatterjee, Y Luo, T Gokhale, Y Yang… - European Conference on …, 2024 - Springer

Abstract Text-to-Image (T2I) and multimodal large language models (MLLMs) have been
adopted in solutions for several computer vision and multimodal learning tasks. However, it …

저장 인용 2회 인용 관련 학술자료 전체 9개의 버전

[Free GPT-4]

[PDF] arxiv.org

Retrieval-augmented generation for ai-generated content: A survey

P Zhao, H Zhang, Q Yu, Z Wang, Y Geng, F Fu… - arxiv preprint arxiv …, 2024 - arxiv.org

The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …

저장 인용 179회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Structure-Guided Adversarial Training of Diffusion Models

L Yang, H Qian, Z Zhang, J Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Diffusion models have demonstrated exceptional efficacy in various generative applications.
While existing models focus on minimizing a weighted sum of denoising score matching …

저장 인용 10회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Genartist: Multimodal llm as an agent for unified image generation and editing

Z Wang, A Li, Z Li, X Liu - arxiv preprint arxiv:2407.05600, 2024 - arxiv.org

Despite the success achieved by existing image generation and editing methods, current
models still struggle with complex problems including intricate text prompts, and the …

저장 인용 9회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] openreview.net

Cross-modal contextualized diffusion models for text-guided visual generation and editing

L Yang, Z Zhang, Z Yu, J Liu, M Xu… - The Twelfth …, 2024 - openreview.net

Conditional diffusion models have exhibited superior performance in high-fidelity text-
guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion …

저장 인용 10회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] aaai.org

Lion: Implicit vision prompt tuning

H Wang, J Chang, Y Zhai, X Luo, J Sun, Z Lin… - Proceedings of the …, 2024 - ojs.aaai.org

Despite recent promising performances across a range of vision tasks, vision Transformers
still have an issue of high computational costs. Recently, vision prompt learning has …

저장 인용 21회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Dit4edit: Diffusion transformer for image editing

K Feng, Y Ma, B Wang, C Qi, H Chen, Q Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite recent advances in UNet-based image editing, methods for shape-aware object
editing in high-resolution images are still lacking. Compared to UNet, Diffusion Transformers …

저장 인용 4회 인용 관련 학술자료 HTML 버전

[Free GPT-4]

[PDF] acm.org

Gloss-driven Conditional Diffusion Models for Sign Language Production

S Tang, F Xue, J Wu, S Wang, R Hong - ACM Transactions on …, 2024 - dl.acm.org

Sign Language Production (SLP) aims to convert text or audio sentences into sign language
videos corresponding to their semantics, which is challenging due to the diversity and …

저장 인용 9회 인용 관련 학술자료 전체 2개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Improving diffusion-based image synthesis with context prediction

Diffusion models: A comprehensive survey of methods and applications

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

Revision: Rendering tools enable spatial fidelity in vision-language models

Retrieval-augmented generation for ai-generated content: A survey

Structure-Guided Adversarial Training of Diffusion Models

Genartist: Multimodal llm as an agent for unified image generation and editing

Cross-modal contextualized diffusion models for text-guided visual generation and editing

Lion: Implicit vision prompt tuning

Dit4edit: Diffusion transformer for image editing

Gloss-driven Conditional Diffusion Models for Sign Language Production