Write and paint: Generative vision-language models are unified modal learners

S Diao, W Zhou, X Zhang, J Wang - arxiv preprint arxiv:2206.07699, 2022 - arxiv.org
Recent advances in vision-language pre-training have pushed the state-of-the-art on
various vision-language tasks, making machines more capable of multi-modal writing …

Data-efficient french language modeling with camemberta

W Antoun, B Sagot, D Seddah - arxiv preprint arxiv:2306.01497, 2023 - arxiv.org
Recent advances in NLP have significantly improved the performance of language models
on a variety of tasks. While these advances are largely driven by the availability of large …

AdaGrad under Anisotropic Smoothness

Y Liu, R Pan, T Zhang - arxiv preprint arxiv:2406.15244, 2024 - arxiv.org
Adaptive gradient methods have been widely adopted in training large-scale deep neural
networks, especially large foundation models. Despite the huge success in practice, their …

Accelerated convergence of stochastic heavy ball method under anisotropic gradient noise

R Pan, Y Liu, X Wang, T Zhang - arxiv preprint arxiv:2312.14567, 2023 - arxiv.org
Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing
deep learning models. In contrast to its empirical popularity, the understanding of its …

Towards Efficient and Domain-Aware Adaptation of Foundation Models

S Diao - 2023 - search.proquest.com
In the burgeoning realm of artificial intelligence, foundation models stand out as a pivotal
advancement. Foundation models centralize the information from broad data across various …

[PDF][PDF] Data-Efficient French Language Modeling with CAMEMBERTA

WABSD Seddah - counter-project.eu
Recent advances in NLP have significantly improved the performance of language models
on a variety of tasks. While these advances are largely driven by the availability of large …