[HTML][HTML] Data augmentation: A comprehensive survey of modern approaches

A Mumuni, F Mumuni - Array, 2022‏ - Elsevier
To ensure good performance, modern machine learning models typically require large
amounts of quality annotated data. Meanwhile, the data collection and annotation processes …

A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities

Y Song, T Wang, P Cai, SK Mondal… - ACM Computing Surveys, 2023‏ - dl.acm.org
Few-shot learning (FSL) has emerged as an effective learning method and shows great
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

[HTML][HTML] Deep learning in food category recognition

Y Zhang, L Deng, H Zhu, W Wang, Z Ren, Q Zhou… - Information …, 2023‏ - Elsevier
Integrating artificial intelligence with food category recognition has been a field of interest for
research for the past few decades. It is potentially one of the next steps in revolutionizing …

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Hyena hierarchy: Towards larger convolutional language models

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023‏ - proceedings.mlr.press
Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024‏ - Elsevier
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …