From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
Membership inference attacks on machine learning: A survey
Machine learning (ML) models have been widely applied to various applications, including
image classification, text generation, audio recognition, and graph data analysis. However …
image classification, text generation, audio recognition, and graph data analysis. However …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Scaling up gans for text-to-image synthesis
The recent success of text-to-image synthesis has taken the world by storm and captured the
general public's imagination. From a technical standpoint, it also marked a drastic change in …
general public's imagination. From a technical standpoint, it also marked a drastic change in …
Unified-io: A unified model for vision, language, and multi-modal tasks
We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …
computer vision tasks, including pose estimation, object detection, depth estimation and …
Visual attention network
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …
mechanism has recently taken various computer vision areas by storm. However, the 2D …
Magicbrush: A manually annotated dataset for instruction-guided image editing
Text-guided image editing is widely needed in daily life, ranging from personal use to
professional applications such as Photoshop. However, existing methods are either zero …
professional applications such as Photoshop. However, existing methods are either zero …
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
Deep model reassembly
In this paper, we explore a novel knowledge-transfer task, termed as Deep Model
Reassembly (DeRy), for general-purpose model reuse. Given a collection of heterogeneous …
Reassembly (DeRy), for general-purpose model reuse. Given a collection of heterogeneous …
On-device training under 256kb memory
On-device training enables the model to adapt to new data collected from the sensors by
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …