Gaussctrl: Multi-view consistent text-driven 3d gaussian splatting editing

J Wu, JW Bian, X Li, G Wang, I Reid, P Torr… - … on Computer Vision, 2024 - Springer
We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D
Gaussian Splatting (3DGS). Our method first renders a collection of images by using the …

Trailblazer: Trajectory control for diffusion-based video generation

WDK Ma, JP Lewis, WB Kleijn - SIGGRAPH Asia 2024 Conference …, 2024 - dl.acm.org
Large text-to-video (T2V) models such as Sora have the potential to revolutionize visual
effects and the creation of some types of movies. Current T2V models require tedious trial …

BAMM: bidirectional autoregressive motion model

E Pinyoanuntapong, MU Saleem, P Wang… - … on Computer Vision, 2024 - Springer
Generating human motion from text has been dominated by denoising motion models either
through diffusion or generative masking process. However, these models face great …

Mismatch quest: Visual and textual feedback for image-text misalignment

B Gordon, Y Bitton, Y Shafir, R Garg, X Chen… - … on Computer Vision, 2024 - Springer
While existing image-text alignment models reach high quality binary assessments, they fall
short of pinpointing the exact source of misalignment. In this paper, we present a method to …

Action2sound: Ambient-aware generation of action sounds from egocentric videos

C Chen, P Peng, A Baid, Z Xue, WN Hsu… - … on Computer Vision, 2024 - Springer
Generating realistic audio for human actions is important for many applications, such as
creating sound effects for films or virtual reality games. Existing approaches implicitly …

Towards building specialized generalist ai with system 1 and system 2 fusion

K Zhang, B Qi, B Zhou - arxiv preprint arxiv:2407.08642, 2024 - arxiv.org
In this perspective paper, we introduce the concept of Specialized Generalist Artificial
Intelligence (SGAI or simply SGI) as a crucial milestone toward Artificial General Intelligence …

Creativity in AI: Progresses and Challenges

M Ismayilzada, D Paul, A Bosselut… - arxiv preprint arxiv …, 2024 - arxiv.org
Creativity is the ability to produce novel, useful, and surprising ideas, and has been widely
studied as a crucial aspect of human cognition. Machine creativity on the other hand has …

PS-StyleGAN: Illustrative Portrait Sketching Using Attention-Based Style Adaptation

KK Jain, JA Varun, A Namboodiri - International Conference on Pattern …, 2024 - Springer
Portrait sketching involves capturing identity specific attributes of a real face with abstract
lines and shades. Unlike photo-realistic images, a good portrait sketch generation method …

Latent Diffusion for Guided Document Table Generation

SJH Hamdani, S Saifullah, S Agne, A Dengel… - … on Document Analysis …, 2024 - Springer
Obtaining annotated table structure data for complex tables is a challenging task due to the
inherent diversity and complexity of real-world document layouts. The scarcity of publicly …

Position: Levels of AGI for Operationalizing Progress on the Path to AGI

MR Morris, J Sohl-Dickstein, N Fiedel… - Forty-first International … - openreview.net
We propose a framework for classifying the capabilities and behavior of Artificial General
Intelligence (AGI) models and their precursors. This framework introduces levels of AGI …