- Academic Search

R Jain, VM Bhasi, A Jog… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org

Personalized recommendation is a ubiquitous appli-cation on the internet, with many
industries and hyperscalers extensively leveraging Deep Learning Recommendation …

Salva Cita Articoli correlati Tutte e 5 le versioni

[Free GPT-4]

[PDF] github.io

SMILE: LLC-based Shared Memory Expansion to Improve GPU Thread Level Parallelism

T Guo, X Huang, K Wu, X Zhang, N **ao - … of the 61st ACM/IEEE Design …, 2024 - dl.acm.org

While designed for massive parallelism, GPUs are frequently suffering from low thread
occupancy and limited data throughput, which are typically attributed to constrained on-chip …

Salva Cita Articoli correlati

[Free GPT-4]

[PDF] pasalabs.org

[PDF][PDF] Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory

J Ren, B Ma, S Yang, B Francis, EK Ardestani, M Si… - pasalabs.org

Deep learning recommendation models (DLRMs) are widely used in industry, and their
memory capacity requirements reach the terabyte scale. Tiered memory architectures …

Salva Cita Articoli correlati Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs

Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs

SMILE: LLC-based Shared Memory Expansion to Improve GPU Thread Level Parallelism

[PDF][PDF] Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory