- Academic Search

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

保存引用被引用次数：29 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

vtrain: A simulation framework for evaluating cost-effective and compute-optimal large language model training

J Bang, Y Choi, M Kim, Y Kim… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org

As large language models (LLMs) become widespread in various application domains, a
critical challenge the AI community is facing is how to train these large AI models in a cost …

保存引用被引用次数：7 相关文章所有 2 个版本

[Free GPT-4]

[PDF] acm.org

Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads

C Avalos Baddouh, M Khairy, RN Green… - MICRO-54: 54th Annual …, 2021 - dl.acm.org

Simulating all threads in a scaled GPU workload results in prohibitive simulation cost. Cycle-
level simulation is orders of magnitude slower than native silicon, the only solution is to …

保存引用被引用次数：29 相关文章所有 11 个版本

[Free GPT-4]

[PDF] arxiv.org

Demystifying bert: Implications for accelerator design

S Pati, S Aga, N Jayasena, MD Sinclair - arxiv preprint arxiv:2104.08335, 2021 - arxiv.org

Transfer learning in natural language processing (NLP), as realized using models like BERT
(Bi-directional Encoder Representation from Transformer), has significantly improved …

保存引用被引用次数：15 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] ugent.be

Sieve: Stratified GPU-compute workload sampling

M Naderan-Tahan, H SeyyedAghaei… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org

To exploit the ever increasing compute capabilities offered by GPU hardware, GPU-compute
workloads have evolved from simple computational kernels to large-scale programs with …

保存引用被引用次数：7 相关文章所有 4 个版本

[Free GPT-4]

[PDF] wisc.edu

Tale of Two Cs: Computation vs. Communication Scaling for Future Transformers on Future Hardware

S Pati, S Aga, M Islam, N Jayasena… - 2023 IEEE …, 2023 - ieeexplore.ieee.org

Scaling neural network models has delivered dramatic quality gains across ML problems.
However, this scaling also increased the reliance on efficient distributed training techniques …

保存引用被引用次数：12 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

Global Optimizations & Lightweight Dynamic Logic for Concurrency

S Pati, S Aga, N Jayasena, MD Sinclair - arxiv preprint arxiv:2409.02227, 2024 - arxiv.org

Modern accelerators like GPUs are increasingly executing independent operations
concurrently to improve the device's compute utilization. However, effectively harnessing it …

保存引用被引用次数：1 相关文章 HTML 版

[Free GPT-4]

[PDF] wisc.edu

[PDF][PDF] Simulating Machine Learning Models at Scale

V Ramadas, MD Sinclair - SRC TECHCON, 2024 - pages.cs.wisc.edu

In recent years deep neural networks (DNNs) have emerged as an important application
domain driving the requirements for future systems. As DNNs get more sophisticated, their …

保存引用被引用次数：1 相关文章 HTML 版

[Free GPT-4]

[PDF] wisc.edu

[PDF][PDF] Simulation Support for Fast and Accurate Large-Scale GPGPU & Accelerator Workloads

V Ramadas, M Poremba, B Beckmann… - Third Workshop on …, 2024 - pages.cs.wisc.edu

In recent years deep neural networks (DNNs) have emerged as an important application
domain driving the requirements for future systems. As DNNs get more sophisticated, their …

保存引用被引用次数：2 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] ucr.edu

Tpupoint: Automatic characterization of hardware-accelerated machine-learning behavior for cloud computing

A Wudenhe, HW Tseng - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org

With the share of machine learning (ML) workloads in data centers rapidly increasing, cloud
providers are beginning to incorporate accelerators such as tensor processing units (TPUs) …

保存引用被引用次数：3 相关文章所有 4 个版本

创建快讯

引用

高级搜索

已保存到“我的图书馆”

SeqPoint: Identifying representative iterations of sequence-based neural networks

Demystifying bert: System design implications

vtrain: A simulation framework for evaluating cost-effective and compute-optimal large language model training

Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads

Demystifying bert: Implications for accelerator design

Sieve: Stratified GPU-compute workload sampling

Tale of Two Cs: Computation vs. Communication Scaling for Future Transformers on Future Hardware

Global Optimizations & Lightweight Dynamic Logic for Concurrency

[PDF][PDF] Simulating Machine Learning Models at Scale

[PDF][PDF] Simulation Support for Fast and Accurate Large-Scale GPGPU & Accelerator Workloads

Tpupoint: Automatic characterization of hardware-accelerated machine-learning behavior for cloud computing