Fine-tuning language models with just forward passes
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …
A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization
We consider the zeroth-order optimization problem in the huge-scale setting, where the
dimension of the problem is so large that performing even basic vector operations on the …
dimension of the problem is so large that performing even basic vector operations on the …
A Hamilton–Jacobi-based proximal operator
First-order optimization algorithms are widely used today. Two standard building blocks in
these algorithms are proximal operators (proximals) and gradients. Although gradients can …
these algorithms are proximal operators (proximals) and gradients. Although gradients can …
Deepzero: Scaling up zeroth-order optimization for deep model training
Zeroth-order (ZO) optimization has become a popular technique for solving machine
learning (ML) problems when first-order (FO) information is difficult or impossible to obtain …
learning (ML) problems when first-order (FO) information is difficult or impossible to obtain …
Revisiting zeroth-order optimization for memory-efficient llm fine-tuning: A benchmark
In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained
Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has …
Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has …
Zeroth-order algorithms for stochastic distributed nonconvex optimization
In this paper, we consider a stochastic distributed nonconvex optimization problem with the
cost function being distributed over n agents having access only to zeroth-order (ZO) …
cost function being distributed over n agents having access only to zeroth-order (ZO) …
How to robustify black-box ml models? a zeroth-order optimization perspective
The lack of adversarial robustness has been recognized as an important issue for state-of-
the-art machine learning (ML) models, eg, deep neural networks (DNNs). Thereby …
the-art machine learning (ML) models, eg, deep neural networks (DNNs). Thereby …
DPZero: dimension-independent and differentially private zeroth-order optimization
The widespread practice of fine-tuning pretrained large language models (LLMs) on domain-
specific data faces two major challenges in memory and privacy. First, as the size of LLMs …
specific data faces two major challenges in memory and privacy. First, as the size of LLMs …
Stochastic zeroth-order Riemannian derivative estimation and optimization
We consider stochastic zeroth-order optimization over Riemannian submanifolds embedded
in Euclidean space, where the task is to solve Riemannian optimization problems with only …
in Euclidean space, where the task is to solve Riemannian optimization problems with only …
Zeroth-order hard-thresholding: gradient error vs. expansivity
Abstract $\ell_0 $ constrained optimization is prevalent in machine learning, particularly for
high-dimensional problems, because it is a fundamental approach to achieve sparse …
high-dimensional problems, because it is a fundamental approach to achieve sparse …