A Practitioner's Guide to Continual Multimodal Pretraining

K Roth, V Udandarao, S Dziadzio, A Prabhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal foundation models serve numerous applications at the intersection of vision and
language. Still, despite being pretrained on extensive data, they become outdated over time …

Improving intervention efficacy via concept realignment in concept bottleneck models

N Singhi, JM Kim, K Roth, Z Akata - European Conference on Computer …, 2024 - Springer
Abstract Concept Bottleneck Models (CBMs) ground image classification on human-
understandable concepts to allow for interpretable model decisions as well as human …

UNIC: Universal classification models via multi-teacher distillation

MB Sariyildiz, P Weinzaepfel, T Lucas, D Larlus… - arxiv preprint arxiv …, 2024 - arxiv.org
Pretrained models have become a commodity and offer strong results on a broad range of
tasks. In this work, we focus on classification and seek to learn a unique encoder able to …

Active data curation effectively distills large-scale multimodal models

V Udandarao, N Parthasarathy, MF Naeem… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into
smaller ones. Prior works have explored ever more complex KD strategies involving different …

UNIC: Universal Classification Models via Multi-teacher Distillation

MB Sarıyıldız, P Weinzaepfel, T Lucas, D Larlus… - … on Computer Vision, 2024 - Springer
Pretrained models have become a commodity and offer strong results on a broad range of
tasks. In this work, we focus on classification and seek to learn a unique encoder able to …

[HTML][HTML] Towards a Decentralized Collaborative Framework for Scalable Edge AI

AM Abdelmoniem, M Jaber, A Anwar, Y Zhang, M Gao - Future Internet, 2024 - mdpi.com
Nowadays, Edge Intelligence has seen unprecedented growth in most of our daily life
applications. Traditionally, most applications required significant efforts into data collection …

How to Merge Your Multimodal Models Over Time?

S Dziadzio, V Udandarao, K Roth, A Prabhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging combines multiple expert models-finetuned from a base foundation model
on diverse tasks and domains-into a single, more capable model. However, most existing …

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

M Bini, K Roth, Z Akata, A Khoreva - arxiv preprint arxiv:2405.20271, 2024 - arxiv.org
Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to
downstream task requirements while retaining their generalization ability. However, the …

Weak-to-Strong Enhanced Vision Model

J Guo, H Chen, C Wang, K Han, C Xu, Y Wang - openreview.net
Recent advancements in large language and vision models have demonstrated
extraordinary capabilities, driving researchers to train increasingly larger models in pursuit …