Google Akademik

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Kaydet Alıntı yap Alıntılanma sayısı: 251 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org

Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

Kaydet Alıntı yap Alıntılanma sayısı: 33 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Segment anything in high quality

L Ke, M Ye, M Danelljan, YW Tai… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract The recent Segment Anything Model (SAM) represents a big leap in scaling up
segmentation models, allowing for powerful zero-shot capabilities and flexible prompting …

Kaydet Alıntı yap Alıntılanma sayısı: 315 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Simda: Simple diffusion adapter for efficient video generation

Z **ng, Q Dai, H Hu, Z Wu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …

Kaydet Alıntı yap Alıntılanma sayısı: 65 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Kaydet Alıntı yap Alıntılanma sayısı: 126 İlgili makaleler 10 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

One-peace: Exploring one general representation model toward unlimited modalities

P Wang, S Wang, J Lin, S Bai, X Zhou, J Zhou… - arxiv preprint arxiv …, 2023 - arxiv.org

In this work, we explore a scalable way for building a general representation model toward
unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B …

Kaydet Alıntı yap Alıntılanma sayısı: 121 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

W Wu, X Wang, H Luo, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …

Kaydet Alıntı yap Alıntılanma sayısı: 91 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Dreamvideo: Composing your dream videos with customized subject and motion

Y Wei, S Zhang, Z Qing, H Yuan, Z Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Customized generation using diffusion models has made impressive progress in image
generation but remains unsatisfactory in the challenging video generation task as it requires …

Kaydet Alıntı yap Alıntılanma sayısı: 84 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Languagebind: Extending video-language pretraining to n-modality by language-based semantic alignment

B Zhu, B Lin, M Ning, Y Yan, J Cui, HF Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

The video-language (VL) pretraining has achieved remarkable improvement in multiple
downstream tasks. However, the current VL pretraining framework is hard to extend to …

Kaydet Alıntı yap Alıntılanma sayısı: 156 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Distilling vision-language models on millions of videos

Y Zhao, L Zhao, X Zhou, J Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

The recent advance in vision-language models is largely attributed to the abundance of
image-text data. We aim to replicate this success for video-language models but there …

Kaydet Alıntı yap Alıntılanma sayısı: 13 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Parameter-efficient fine-tuning for large models: A comprehensive survey

Visual tuning

Segment anything in high quality

Simda: Simple diffusion adapter for efficient video generation

Towards open vocabulary learning: A survey

One-peace: Exploring one general representation model toward unlimited modalities

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

Dreamvideo: Composing your dream videos with customized subject and motion

Languagebind: Extending video-language pretraining to n-modality by language-based semantic alignment

Distilling vision-language models on millions of videos