Sparks of artificial general intelligence: Early experiments with gpt-4
S Bubeck, V Chandrasekaran, R Eldan… - ar** and refining large language
models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks …
models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks …
Reinforcement learning for fine-tuning text-to-image diffusion models
Learning from human feedback has been shown to improve text-to-image models. These
techniques first learn a reward function that captures what humans care about in the task …
techniques first learn a reward function that captures what humans care about in the task …
What you see is what you read? improving text-image alignment evaluation
Automatically determining whether a text and a corresponding image are semantically
aligned is a significant challenge for vision-language models, with applications in generative …
aligned is a significant challenge for vision-language models, with applications in generative …
Discriminative probing and tuning for text-to-image generation
Despite advancements in text-to-image generation (T2I) prior methods often face text-image
misalignment problems such as relation confusion in generated images. Existing solutions …
misalignment problems such as relation confusion in generated images. Existing solutions …
Revision: Rendering tools enable spatial fidelity in vision-language models
Abstract Text-to-Image (T2I) and multimodal large language models (MLLMs) have been
adopted in solutions for several computer vision and multimodal learning tasks. However, it …
adopted in solutions for several computer vision and multimodal learning tasks. However, it …
Compositional abilities emerge multiplicatively: Exploring diffusion models on a synthetic task
Modern generative models exhibit unprecedented capabilities to generate extremely
realistic data. However, given the inherent compositionality of real world, reliable use of …
realistic data. However, given the inherent compositionality of real world, reliable use of …
Controllable text-to-image generation with gpt-4
Current text-to-image generation models often struggle to follow textual instructions,
especially the ones requiring spatial reasoning. On the other hand, Large Language Models …
especially the ones requiring spatial reasoning. On the other hand, Large Language Models …
Unsupervised compositional concepts discovery with text-to-image generative models
Text-to-image generative models have enabled high-resolution image synthesis across
different domains, but require users to specify the content they wish to generate. In this …
different domains, but require users to specify the content they wish to generate. In this …
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
One of the key shortcomings in current text-to-image (T2I) models is their inability to
consistently generate images which faithfully follow the spatial relationships specified in the …
consistently generate images which faithfully follow the spatial relationships specified in the …
What's" up" with vision-language models? Investigating their struggle with spatial reasoning
Recent vision-language (VL) models are powerful, but can they reliably distinguish" right"
from" left"? We curate three new corpora to quantify model comprehension of such basic …
from" left"? We curate three new corpora to quantify model comprehension of such basic …