Google Académico

M Xu, S Yoon, A Fuentes, DS Park - Pattern Recognition, 2023 - Elsevier

Although deep learning has achieved satisfactory performance in computer vision, a large
volume of images is required. However, collecting images is often expensive and …

Guardar Citar Citado por 460 Artículos relacionados Las 6 versiones

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Data augmentation: A comprehensive survey of modern approaches

A Mumuni, F Mumuni - Array, 2022 - Elsevier

To ensure good performance, modern machine learning models typically require large
amounts of quality annotated data. Meanwhile, the data collection and annotation processes …

Guardar Citar Citado por 421 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Guardar Citar Citado por 840 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Deep learning in food category recognition

Y Zhang, L Deng, H Zhu, W Wang, Z Ren, Q Zhou… - Information …, 2023 - Elsevier

Integrating artificial intelligence with food category recognition has been a field of interest for
research for the past few decades. It is potentially one of the next steps in revolutionizing …

Guardar Citar Citado por 280 Artículos relacionados Las 4 versiones

[Free GPT-4]

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Guardar Citar Citado por 703 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] mlr.press

Hyena hierarchy: Towards larger convolutional language models

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023 - proceedings.mlr.press

Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

Guardar Citar Citado por 297 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Guardar Citar Citado por 350 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Guardar Citar Citado por 181 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

Guardar Citar Citado por 227 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] neurips.cc

Vision gnn: An image is worth graph of nodes

K Han, Y Wang, J Guo, Y Tang… - Advances in neural …, 2022 - proceedings.neurips.cc

Network architecture plays a key role in the deep learning-based computer vision system.
The widely-used convolutional neural network and transformer treat the image as a grid or …

Guardar Citar Citado por 422 Artículos relacionados Las 8 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Random erasing data augmentation

[HTML][HTML] A comprehensive survey of image augmentation techniques for deep learning

[HTML][HTML] Data augmentation: A comprehensive survey of modern approaches

Imagebind: One embedding space to bind them all

[HTML][HTML] Deep learning in food category recognition

Eva: Exploring the limits of masked visual representation learning at scale

Hyena hierarchy: Towards larger convolutional language models

Efficientvit: Memory efficient vision transformer with cascaded group attention

Flatten transformer: Vision transformer using focused linear attention

Eva-02: A visual representation for neon genesis

Vision gnn: An image is worth graph of nodes