Normalization techniques in training dnns: Methodology, analysis and application

L Huang, J Qin, Y Zhou, F Zhu, L Liu… - IEEE transactions on …, 2023 - ieeexplore.ieee.org
Normalization techniques are essential for accelerating the training and improving the
generalization of deep neural networks (DNNs), and have successfully been used in various …

Machine learning in process systems engineering: Challenges and opportunities

P Daoutidis, JH Lee, S Rangarajan, L Chiang… - Computers & Chemical …, 2024 - Elsevier
This “white paper” is a concise perspective of the potential of machine learning in the
process systems engineering (PSE) domain, based on a session during FIPSE 5, held in …

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

Why transformers need adam: A hessian perspective

Y Zhang, C Chen, T Ding, Z Li… - Advances in Neural …, 2025 - proceedings.neurips.cc
SGD performs worse than Adam by a significant margin on Transformers, but the reason
remains unclear. In this work, we provide an explanation through the lens of Hessian:(i) …

Characterizing possible failure modes in physics-informed neural networks

A Krishnapriyan, A Gholami, S Zhe… - Advances in neural …, 2021 - proceedings.neurips.cc
Recent work in scientific machine learning has developed so-called physics-informed neural
network (PINN) models. The typical approach is to incorporate physical domain knowledge …

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arxiv preprint arxiv:2305.14342, 2023 - arxiv.org
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

Revisiting weighted aggregation in federated learning with neural networks

Z Li, T Lin, X Shang, C Wu - International Conference on …, 2023 - proceedings.mlr.press
In federated learning (FL), weighted aggregation of local models is conducted to generate a
global model, and the aggregation weights are normalized (the sum of weights is 1) and …

Diverse weight averaging for out-of-distribution generalization

A Rame, M Kirchmeyer, T Rahier… - Advances in …, 2022 - proceedings.neurips.cc
Standard neural networks struggle to generalize under distribution shifts in computer vision.
Fortunately, combining multiple networks can consistently improve out-of-distribution …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …