Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization

HQ Cai, Y Lou, D McKenzie… - … Conference on Machine …, 2021 - proceedings.mlr.press
We consider the zeroth-order optimization problem in the huge-scale setting, where the
dimension of the problem is so large that performing even basic vector operations on the …

A Hamilton–Jacobi-based proximal operator

S Osher, H Heaton, S Wu Fung - Proceedings of the …, 2023 - National Acad Sciences
First-order optimization algorithms are widely used today. Two standard building blocks in
these algorithms are proximal operators (proximals) and gradients. Although gradients can …

Deepzero: Scaling up zeroth-order optimization for deep model training

A Chen, Y Zhang, J Jia, J Diffenderfer, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Zeroth-order (ZO) optimization has become a popular technique for solving machine
learning (ML) problems when first-order (FO) information is difficult or impossible to obtain …

Revisiting zeroth-order optimization for memory-efficient llm fine-tuning: A benchmark

Y Zhang, P Li, J Hong, J Li, Y Zhang, W Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained
Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has …

Zeroth-order algorithms for stochastic distributed nonconvex optimization

X Yi, S Zhang, T Yang, KH Johansson - Automatica, 2022 - Elsevier
In this paper, we consider a stochastic distributed nonconvex optimization problem with the
cost function being distributed over n agents having access only to zeroth-order (ZO) …

How to robustify black-box ml models? a zeroth-order optimization perspective

Y Zhang, Y Yao, J Jia, J Yi, M Hong, S Chang… - arxiv preprint arxiv …, 2022 - arxiv.org
The lack of adversarial robustness has been recognized as an important issue for state-of-
the-art machine learning (ML) models, eg, deep neural networks (DNNs). Thereby …

DPZero: dimension-independent and differentially private zeroth-order optimization

L Zhang, KK Thekumparampil, S Oh… - International Workshop on …, 2023 - openreview.net
The widespread practice of fine-tuning pretrained large language models (LLMs) on domain-
specific data faces two major challenges in memory and privacy. First, as the size of LLMs …

Stochastic zeroth-order Riemannian derivative estimation and optimization

J Li, K Balasubramanian, S Ma - Mathematics of Operations …, 2023 - pubsonline.informs.org
We consider stochastic zeroth-order optimization over Riemannian submanifolds embedded
in Euclidean space, where the task is to solve Riemannian optimization problems with only …

Zeroth-order hard-thresholding: gradient error vs. expansivity

W de Vazelhes, H Zhang, H Wu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract $\ell_0 $ constrained optimization is prevalent in machine learning, particularly for
high-dimensional problems, because it is a fundamental approach to achieve sparse …