Google Académico

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org

Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

Guardar Citar Citado por 1610 Artículos relacionados Las 8 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey on applications of transformers for deep learning tasks

S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2024 - Elsevier

Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …

Guardar Citar Citado por 199 Artículos relacionados Las 8 versiones

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2023 - proceedings.neurips.cc

Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …

Guardar Citar Citado por 5412 Artículos relacionados Las 18 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qwen technical report

J Bai, S Bai, Y Chu, Z Cui, K Dang, X Deng… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have revolutionized the field of artificial intelligence,
enabling natural language processing tasks that were previously thought to be exclusive to …

Guardar Citar Citado por 2454 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Guardar Citar Citado por 853 Artículos relacionados Las 10 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Segment everything everywhere all at once

X Zou, J Yang, H Zhang, F Li, L Li… - Advances in neural …, 2023 - proceedings.neurips.cc

In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …

Guardar Citar Citado por 539 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sigmoid loss for language image pre-training

X Zhai, B Mustafa, A Kolesnikov… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose a simple pairwise sigmoid loss for image-text pre-training. Unlike standard
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …

Guardar Citar Citado por 700 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] nih.gov

Towards a general-purpose foundation model for computational pathology

RJ Chen, T Ding, MY Lu, DFK Williamson, G Jaume… - Nature Medicine, 2024 - nature.com

Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks,
requiring the objective characterization of histopathological entities from whole-slide images …

Guardar Citar Citado por 360 Artículos relacionados Las 6 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Minigpt-v2: large language model as a unified interface for vision-language multi-task learning

J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models have shown their remarkable capabilities as a general interface for
various language-related applications. Motivated by this, we target to build a unified …

Guardar Citar Citado por 529 Artículos relacionados Las 10 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Guardar Citar Citado por 462 Artículos relacionados Las 11 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Florence: A new foundation model for computer vision

Diffusion models: A comprehensive survey of methods and applications

A comprehensive survey on applications of transformers for deep learning tasks

Visual instruction tuning

Qwen technical report

Imagebind: One embedding space to bind them all

Segment everything everywhere all at once

Sigmoid loss for language image pre-training

Towards a general-purpose foundation model for computational pathology

Minigpt-v2: large language model as a unified interface for vision-language multi-task learning

Vision-language models for vision tasks: A survey