[HTML][HTML] A comprehensive survey of image augmentation techniques for deep learning
Although deep learning has achieved satisfactory performance in computer vision, a large
volume of images is required. However, collecting images is often expensive and …
volume of images is required. However, collecting images is often expensive and …
[HTML][HTML] Data augmentation: A comprehensive survey of modern approaches
A Mumuni, F Mumuni - Array, 2022 - Elsevier
To ensure good performance, modern machine learning models typically require large
amounts of quality annotated data. Meanwhile, the data collection and annotation processes …
amounts of quality annotated data. Meanwhile, the data collection and annotation processes …
Imagebind: One embedding space to bind them all
We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …
[HTML][HTML] Deep learning in food category recognition
Integrating artificial intelligence with food category recognition has been a field of interest for
research for the past few decades. It is potentially one of the next steps in revolutionizing …
research for the past few decades. It is potentially one of the next steps in revolutionizing …
Eva: Exploring the limits of masked visual representation learning at scale
We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …
Hyena hierarchy: Towards larger convolutional language models
Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …
to their ability to learn at scale. However, the core building block of Transformers, the …
Efficientvit: Memory efficient vision transformer with cascaded group attention
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …
However, their remarkable performance is accompanied by heavy computation costs, which …
Flatten transformer: Vision transformer using focused linear attention
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
Eva-02: A visual representation for neon genesis
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …
to reconstruct strong and robust language-aligned vision features via masked image …
Vision gnn: An image is worth graph of nodes
Network architecture plays a key role in the deep learning-based computer vision system.
The widely-used convolutional neural network and transformer treat the image as a grid or …
The widely-used convolutional neural network and transformer treat the image as a grid or …