Differentially Private Representation Learning via Image Captioning
Differentially private (DP) machine learning is considered the gold-standard solution for
training a model from sensitive data while still preserving privacy. However, a major barrier …
training a model from sensitive data while still preserving privacy. However, a major barrier …
Interpretable Composition Attribution Enhancement for Visio-linguistic Compositional Understanding
W Li, Z Huang, X Tian, L Lu, H Li… - Proceedings of the …, 2024 - aclanthology.org
Contrastively trained vision-language models such as CLIP have achieved remarkable
progress in vision and language representation learning. Despite the promising progress …
progress in vision and language representation learning. Despite the promising progress …
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Abstract Vision-Language Models (VLMs) such as CLIP exhibit strong image-text
comprehension abilities facilitating advances in several downstream tasks such as zero-shot …
comprehension abilities facilitating advances in several downstream tasks such as zero-shot …
Causal Graphical Models for Vision-Language Compositional Understanding
Recent work has empirically shown that Vision-Language Models (VLMs) struggle to fully
understand the compositional properties of the human language, usually modeling an …
understand the compositional properties of the human language, usually modeling an …
Differentially Private Vision-Language Foundation Models via Image Captioning
The common practice of training foundation models on web-crawled data raises privacy and
copyright concerns, as sensitive training data can be memorized by the model and …
copyright concerns, as sensitive training data can be memorized by the model and …