- Academic Search

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Save Cite Cited by 75 Related articles

[Free GPT-4]

[PDF] arxiv.org

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Save Cite Cited by 151 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Photorealistic text-to-image diffusion models with deep language understanding

C Saharia, W Chan, S Saxena, L Li… - Advances in neural …, 2022 - proceedings.neurips.cc

We present Imagen, a text-to-image diffusion model with an unprecedented degree of
photorealism and a deep level of language understanding. Imagen builds on the power of …

Save Cite Cited by 5621 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Video diffusion models

J Ho, T Salimans, A Gritsenko… - Advances in …, 2022 - proceedings.neurips.cc

Generating temporally coherent high fidelity video is an important milestone in generative
modeling research. We make progress towards this milestone by proposing a diffusion …

Save Cite Cited by 1439 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

When and why vision-language models behave like bags-of-words, and what to do about it?

M Yuksekgonul, F Bianchi, P Kalluri, D Jurafsky… - arxiv preprint arxiv …, 2022 - arxiv.org

Despite the success of large vision and language models (VLMs) in many downstream
applications, it is unclear how well they encode compositional information. Here, we create …

Save Cite Cited by 323 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation

K Huang, K Sun, E **e, Z Li… - Advances in Neural …, 2023 - proceedings.neurips.cc

Despite the stunning ability to generate high-quality images by recent text-to-image models,
current approaches often struggle to effectively compose objects with different attributes and …

Save Cite Cited by 168 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Text-to-image diffusion models in generative ai: A survey

C Zhang, C Zhang, M Zhang, IS Kweon - arxiv preprint arxiv:2303.07909, 2023 - arxiv.org

This survey reviews text-to-image diffusion models in the context that diffusion models have
emerged to be popular for a wide range of generative tasks. As a self-contained work, this …

Save Cite Cited by 337 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Sugarcrepe: Fixing hackable benchmarks for vision-language compositionality

CY Hsieh, J Zhang, Z Ma… - Advances in neural …, 2024 - proceedings.neurips.cc

In the last year alone, a surge of new benchmarks to measure $\textit {compositional} $
understanding of vision-language models have permeated the machine learning ecosystem …

Save Cite Cited by 104 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Easily accessible text-to-image generation amplifies demographic stereotypes at large scale

F Bianchi, P Kalluri, E Durmus, F Ladhak… - Proceedings of the …, 2023 - dl.acm.org

Machine learning models that convert user-written text descriptions into images are now
widely available online and used by millions of users to generate millions of images a day …

Save Cite Cited by 284 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering

Y Hu, B Liu, J Kasai, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Despite thousands of researchers, engineers, and artists actively working on improving text-
to-image generation models, systems often fail to produce images that accurately align with …

Save Cite Cited by 166 Related articles All 5 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Dall-eval: Probing the reasoning skills and social biases of text-to-image generative transformers

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Photorealistic text-to-image diffusion models with deep language understanding

Video diffusion models

When and why vision-language models behave like bags-of-words, and what to do about it?

T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation

Text-to-image diffusion models in generative ai: A survey

Sugarcrepe: Fixing hackable benchmarks for vision-language compositionality

Easily accessible text-to-image generation amplifies demographic stereotypes at large scale

Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering