- Academic Search

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Speichern Zitieren Zitiert von: 396 Ähnliche Artikel Alle 11 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Membership inference attacks on machine learning: A survey

H Hu, Z Salcic, L Sun, G Dobbie, PS Yu… - ACM Computing Surveys …, 2022 - dl.acm.org

Machine learning (ML) models have been widely applied to various applications, including
image classification, text generation, audio recognition, and graph data analysis. However …

Speichern Zitieren Zitiert von: 564 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Speichern Zitieren Zitiert von: 2250 Ähnliche Artikel Alle 11 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Scaling up gans for text-to-image synthesis

M Kang, JY Zhu, R Zhang, J Park… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent success of text-to-image synthesis has taken the world by storm and captured the
general public's imagination. From a technical standpoint, it also marked a drastic change in …

Speichern Zitieren Zitiert von: 536 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Unified-io: A unified model for vision, language, and multi-modal tasks

J Lu, C Clark, R Zellers, R Mottaghi… - The Eleventh …, 2022 - openreview.net

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …

Speichern Zitieren Zitiert von: 406 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer

While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

Speichern Zitieren Zitiert von: 796 Ähnliche Artikel Alle 8 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Magicbrush: A manually annotated dataset for instruction-guided image editing

K Zhang, L Mo, W Chen, H Sun… - Advances in Neural …, 2024 - proceedings.neurips.cc

Text-guided image editing is widely needed in daily life, ranging from personal use to
professional applications such as Photoshop. However, existing methods are either zero …

Speichern Zitieren Zitiert von: 179 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Speichern Zitieren Zitiert von: 105 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Deep model reassembly

X Yang, D Zhou, S Liu, J Ye… - Advances in neural …, 2022 - proceedings.neurips.cc

In this paper, we explore a novel knowledge-transfer task, termed as Deep Model
Reassembly (DeRy), for general-purpose model reuse. Given a collection of heterogeneous …

Speichern Zitieren Zitiert von: 150 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

On-device training under 256kb memory

J Lin, L Zhu, WM Chen, WC Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

On-device training enables the model to adapt to new data collected from the sensors by
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …

Speichern Zitieren Zitiert von: 230 Ähnliche Artikel Alle 8 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Caltech-UCSD birds 200

From show to tell: A survey on deep learning-based image captioning

Membership inference attacks on machine learning: A survey

Dinov2: Learning robust visual features without supervision

Scaling up gans for text-to-image synthesis

Unified-io: A unified model for vision, language, and multi-modal tasks

Visual attention network

Magicbrush: A manually annotated dataset for instruction-guided image editing

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

Deep model reassembly

On-device training under 256kb memory