- Academic Search

L Huang, J Qin, Y Zhou, F Zhu, L Liu… - IEEE transactions on …, 2023 - ieeexplore.ieee.org

Normalization techniques are essential for accelerating the training and improving the
generalization of deep neural networks (DNNs), and have successfully been used in various …

บันทึก อ้างอิง อ้างโดย393 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Machine learning in process systems engineering: Challenges and opportunities

P Daoutidis, JH Lee, S Rangarajan, L Chiang… - Computers & Chemical …, 2024 - Elsevier

This “white paper” is a concise perspective of the potential of machine learning in the
process systems engineering (PSE) domain, based on a session during FIPSE 5, held in …

บันทึก อ้างอิง อ้างโดย36 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

บันทึก อ้างอิง อ้างโดย206 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

บันทึก อ้างอิง อ้างโดย184 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Why transformers need adam: A hessian perspective

Y Zhang, C Chen, T Ding, Z Li… - Advances in Neural …, 2025 - proceedings.neurips.cc

SGD performs worse than Adam by a significant margin on Transformers, but the reason
remains unclear. In this work, we provide an explanation through the lens of Hessian:(i) …

บันทึก อ้างอิง อ้างโดย30 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Characterizing possible failure modes in physics-informed neural networks

A Krishnapriyan, A Gholami, S Zhe… - Advances in neural …, 2021 - proceedings.neurips.cc

Recent work in scientific machine learning has developed so-called physics-informed neural
network (PINN) models. The typical approach is to incorporate physical domain knowledge …

บันทึก อ้างอิง อ้างโดย814 บทความที่เกี่ยวข้อง ทั้งหมด 10 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arxiv preprint arxiv:2305.14342, 2023 - arxiv.org

Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

บันทึก อ้างอิง อ้างโดย136 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Revisiting weighted aggregation in federated learning with neural networks

Z Li, T Lin, X Shang, C Wu - International Conference on …, 2023 - proceedings.mlr.press

In federated learning (FL), weighted aggregation of local models is conducted to generate a
global model, and the aggregation weights are normalized (the sum of weights is 1) and …

บันทึก อ้างอิง อ้างโดย72 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Diverse weight averaging for out-of-distribution generalization

A Rame, M Kirchmeyer, T Rahier… - Advances in …, 2022 - proceedings.neurips.cc

Standard neural networks struggle to generalize under distribution shifts in computer vision.
Fortunately, combining multiple networks can consistently improve out-of-distribution …

บันทึก อ้างอิง อ้างโดย131 บทความที่เกี่ยวข้อง ทั้งหมด 11 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

บันทึก อ้างอิง อ้างโดย99 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Pyhessian: Neural networks through the lens of the hessian

Normalization techniques in training dnns: Methodology, analysis and application

Machine learning in process systems engineering: Challenges and opportunities

Fine-tuning language models with just forward passes

Squeezellm: Dense-and-sparse quantization

Why transformers need adam: A hessian perspective

Characterizing possible failure modes in physics-informed neural networks

Sophia: A scalable stochastic second-order optimizer for language model pre-training

Revisiting weighted aggregation in federated learning with neural networks

Diverse weight averaging for out-of-distribution generalization

Full stack optimization of transformer inference: a survey