Google Наука

M Salehi, E Gavves, CGM Snoek… - Proceedings of the …, 2023 - openaccess.thecvf.com

Spatially dense self-supervised learning is a rapidly growing problem domain with
promising applications for unsupervised segmentation and pretraining for dense …

Запазване Позоваване С позовавания в 19 Сродни статии Всички 10 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Grounding language models for visual entity recognition

Z **ao, M Gong, P Cascante-Bonilla, X Zhang… - … on Computer Vision, 2024 - Springer

Abstract We introduce AutoVER, an Autoregressive model for Visual Entity Recognition. Our
model extends an autoregressive Multimodal Large Language Model by employing retrieval …

Запазване Позоваване С позовавания в 8 Сродни статии Всички 7 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised visual learning from interactions with objects

A Aubret, C Teulière, J Triesch - European Conference on Computer …, 2024 - Springer

Self-supervised learning (SSL) has revolutionized visual representation learning, but has
not achieved the robustness of human vision. A reason for this could be that SSL does not …

Запазване Позоваване С позовавания в 7 Сродни статии Всички 7 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Representation learning and identity adversarial training for facial behavior understanding

M Ning, AA Salah, IO Ertugrul - arxiv preprint arxiv:2407.11243, 2024 - arxiv.org

Facial Action Unit (AU) detection has gained significant research attention as AUs contain
complex expression information. In this paper, we unpack two fundamental factors in AU …

Запазване Позоваване С позовавания в 9 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] techrxiv.org

Foundation models for video understanding: A survey

N Madan, A Møgelmose, R Modi, YS Rawat… - Authorea …, 2024 - techrxiv.org

Video Foundation Models (ViFMs) aim to develop general-purpose representations for
various video understanding tasks by leveraging large-scale datasets and powerful models …

Запазване Позоваване С позовавания в 19 Сродни статии Всички 4 версии Във вид на HTML

[КНИГА][B] Computer Vision-ECCV 2024: 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XXIV.

A Leonardis, E Ricci, S Roth, O Russakovsky, T Sattler… - 2024 - books.google.com

The multi-volume set of LNCS books with volume numbers 15059 up to 15147 constitutes
the refereed proceedings of the 18th European Conference on Computer Vision, ECCV …

Запазване Позоваване С позовавания в 2 Сродни статии Всички 4 версии Търсене на библиотеки

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

G Wang, F Lin, T Wu, Z Liu, Z Ba, K Ren - arxiv preprint arxiv:2412.12032, 2024 - arxiv.org

This work asks: with abundant, unlabeled real faces, how to learn a robust and transferable
facial representation that boosts various face security tasks with respect to generalization …

Запазване Позоваване Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CrossVideoMAE: Self-Supervised Image-Video Representation Learning with Masked Autoencoders

SA Ahamed, M Gunawardhana, L David… - arxiv preprint arxiv …, 2025 - arxiv.org

Current video-based Masked Autoencoders (MAEs) primarily focus on learning effective
spatiotemporal representations from a visual perspective, which may lead the model to …

Запазване Позоваване Сродни статии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] carleton.ca

Self-supervised Pretraining of Vision Transformers for Earth Observation

A Fuller - 2023 - repository.library.carleton.ca

Remote sensing offers vast yet sparsely labeled multimodal data but lacks foundation
models that can be leveraged across societally impactful applications. In this thesis, I …

Запазване Позоваване Сродни статии Всички 3 версии Търсене на библиотеки Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Vic-mae: Self-supervised representation learning from images and video with contrastive masked...

Time does tell: Self-supervised time-tuning of dense image representations

Grounding language models for visual entity recognition

Self-supervised visual learning from interactions with objects

Representation learning and identity adversarial training for facial behavior understanding

Foundation models for video understanding: A survey

[КНИГА][B] Computer Vision-ECCV 2024: 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XXIV.

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

CrossVideoMAE: Self-Supervised Image-Video Representation Learning with Masked Autoencoders

Self-supervised Pretraining of Vision Transformers for Earth Observation