- Academic Search

Artykuły

Scholar

Wyników: 8 (0,02 s)

Towards a Flexible and High-Fidelity Approach to Distributed DNN Training Emulation

B Liu, MA Ojewale, Y Ding, M Canini - … of the 15th ACM SIGOPS Asia …, 2024 - dl.acm.org

We propose NeuronaBox, a flexible, user-friendly, and high-fidelity approach to emulate
DNN training workloads. We argue that to accurately observe performance, it is possible to …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Entropy-Guided Attention for Private LLMs

NK Jha, B Reagen - arxiv preprint arxiv:2501.03489, 2025 - arxiv.org

The pervasiveness of proprietary language models has raised critical privacy concerns,
necessitating advancements in private inference (PI), where computations are performed …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AERO: Softmax-Only LLMs for Efficient Private Inference

NK Jha, B Reagen - arxiv preprint arxiv:2410.13060, 2024 - arxiv.org

The pervasiveness of proprietary language models has raised privacy concerns for users'
sensitive data, emphasizing the need for private inference (PI), where inference is performed …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models

NK Jha, B Reagen - arxiv preprint arxiv:2410.09637, 2024 - arxiv.org

LayerNorm is a critical component in modern large language models (LLMs) for stabilizing
training and ensuring smooth optimization. However, it introduces significant challenges in …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Understanding and Minimising Outlier Features in Transformer Training

B He, L Noci, D Paliotta, I Schlag, T Hofmann - The Thirty-eighth Annual … - openreview.net

Outlier Features (OFs) are neurons whose activation magnitudes significantly exceed the
average over a neural network's (NN) width. They are well known to emerge during standard …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Testing knowledge distillation theories with dataset size

G Lanzillotta, F Sarnthein, G Kur… - … 2024 Workshop on …, 2024 - openreview.net

The concept of knowledge distillation (KD) describes the training of a student model with a
teacher model and is a widespread technique in deep learning. However, it is still not clear …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] unifr.ch

Compositional visual reasoning and generalization with neural networks

A Stanić - 2024 - folia.unifr.ch

Deep neural networks (NNs) recently revolutionized the field of Artificial Intelligence, making
great progress in computer vision, natural language processing, complex game play …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 4 Wersja HTML

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Towards a Flexible and High-Fidelity Approach to Distributed DNN Training Emulation

Entropy-Guided Attention for Private LLMs

AERO: Softmax-Only LLMs for Efficient Private Inference

ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models

Understanding and Minimising Outlier Features in Transformer Training

Testing knowledge distillation theories with dataset size

Compositional visual reasoning and generalization with neural networks