Google Akademik

T-mars: Improving visual representations by circumventing text feature learning

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Veclip: Improving clip training via visual-enriched captions

Z Lai, H Zhang, B Zhang, W Wu, H Bai… - … on Computer Vision, 2024 - Springer

Large-scale web-crawled datasets are fundamental for the success of pre-training vision-
language models, such as CLIP. However, the inherent noise and potential irrelevance of …

Kaydet Alıntı yap Alıntılanma sayısı: 18 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

No" zero-shot" without exponential data: Pretraining concept frequency determines multimodal model performance

V Udandarao, A Prabhu, A Ghosh… - The Thirty-eighth …, 2024 - openreview.net

Web-crawled pretraining datasets underlie the impressive" zero-shot" evaluation
performance of multimodal models, such as CLIP for classification and Stable-Diffusion for …

Kaydet Alıntı yap Alıntılanma sayısı: 41 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Scaling Laws for Data Filtering--Data Curation cannot be Compute Agnostic

S Goyal, P Maini, ZC Lipton… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language models (VLMs) are trained for thousands of GPU hours on carefully
selected subsets of massive web scrapes. For instance the LAION public dataset retained …

Kaydet Alıntı yap Alıntılanma sayısı: 23 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sieve: Multimodal dataset pruning using image captioning models

A Mahmoud, M Elhoushi, A Abbas… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Vision-Language Models (VLMs) are pretrained on large diverse and noisy web-
crawled datasets. This underscores the critical need for dataset pruning as the quality of …

Kaydet Alıntı yap Alıntılanma sayısı: 17 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

From scarcity to efficiency: Improving clip training via visual-enriched captions

Z Lai, H Zhang, W Wu, H Bai, A Timofeev, X Du, Z Gan… - 2023 - openreview.net

Web-crawled datasets are pivotal to the success of pre-training vision-language models,
exemplified by CLIP. However, web-crawled AltTexts can be noisy and potentially irrelevant …

Kaydet Alıntı yap Alıntılanma sayısı: 28 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hype: Hyperbolic entailment filtering for underspecified images and texts

W Kim, S Chun, T Kim, D Han, S Yun - European Conference on Computer …, 2024 - Springer

In an era where the volume of data drives the effectiveness of self-supervised learning, the
specificity and clarity of data semantics play a crucial role in model training. Addressing this …

Kaydet Alıntı yap Alıntılanma sayısı: 5 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rephrasing the web: A recipe for compute and data-efficient language modeling

P Maini, S Seto, H Bai, D Grangier, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models are trained on massive scrapes of the web, which are often
unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such …

Kaydet Alıntı yap Alıntılanma sayısı: 44 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Parrot captions teach clip to spot text

Y Lin, C He, AJ Wang, B Wang, W Li… - European Conference on …, 2024 - Springer

Despite CLIP being the foundation model in numerous vision-language applications, CLIP
suffers from a severe text spotting bias. Such bias causes CLIP models to 'Parrot'the visual …

Kaydet Alıntı yap Alıntılanma sayısı: 10 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An introduction to vision-language modeling

F Bordes, RY Pang, A Ajay, AC Li, A Bardes… - arxiv preprint arxiv …, 2024 - arxiv.org

Following the recent popularity of Large Language Models (LLMs), several attempts have
been made to extend them to the visual domain. From having a visual assistant that could …

Kaydet Alıntı yap Alıntılanma sayısı: 53 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

Kaydet Alıntı yap Alıntılanma sayısı: 5 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

T-mars: Improving visual representations by circumventing text feature learning

Veclip: Improving clip training via visual-enriched captions

No" zero-shot" without exponential data: Pretraining concept frequency determines multimodal model performance

Scaling Laws for Data Filtering--Data Curation cannot be Compute Agnostic

Sieve: Multimodal dataset pruning using image captioning models

From scarcity to efficiency: Improving clip training via visual-enriched captions

Hype: Hyperbolic entailment filtering for underspecified images and texts

Rephrasing the web: A recipe for compute and data-efficient language modeling

Parrot captions teach clip to spot text

An introduction to vision-language modeling

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective