Differentially Private Representation Learning via Image Captioning

T Sander, Y Yu, M Sanjabi, A Durmus, Y Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
Differentially private (DP) machine learning is considered the gold-standard solution for
training a model from sensitive data while still preserving privacy. However, a major barrier …

Interpretable Composition Attribution Enhancement for Visio-linguistic Compositional Understanding

W Li, Z Huang, X Tian, L Lu, H Li… - Proceedings of the …, 2024 - aclanthology.org
Contrastively trained vision-language models such as CLIP have achieved remarkable
progress in vision and language representation learning. Despite the promising progress …

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding

L Zhang, R Awal, A Agrawal - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Abstract Vision-Language Models (VLMs) such as CLIP exhibit strong image-text
comprehension abilities facilitating advances in several downstream tasks such as zero-shot …

Causal Graphical Models for Vision-Language Compositional Understanding

F Parascandolo, N Moratelli, E Sangineto… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent work has empirically shown that Vision-Language Models (VLMs) struggle to fully
understand the compositional properties of the human language, usually modeling an …

Differentially Private Vision-Language Foundation Models via Image Captioning

T Sander, Y Yu, M Sanjabi, AO Durmus, Y Ma… - openreview.net
The common practice of training foundation models on web-crawled data raises privacy and
copyright concerns, as sensitive training data can be memorized by the model and …