Google 학술 검색

Tolerating memory latency through software-controlled pre-execution in simultaneous multithreadin...

S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org

As the trends of process scaling make memory systems an even more crucial bottleneck, the
importance of latency hiding techniques such as prefetching grows further. However, naively …

저장 인용 146회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] arxiv.org

Pythia: A customizable hardware prefetching framework using online reinforcement learning

R Bera, K Kanellopoulos, A Nori, T Shahroodi… - MICRO-54: 54th Annual …, 2021 - dl.acm.org

Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …

저장 인용 91회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]

[PDF] illinois.edu

Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation

K Hsieh, S Khan, N Vijaykumar… - 2016 IEEE 34th …, 2016 - ieeexplore.ieee.org

Pointer chasing is a fundamental operation, used by many important data-intensive
applications (eg, databases, key-value stores, graph processing workloads) to traverse …

저장 인용 261회 인용 관련 학술자료 전체 20개의 버전

[Free GPT-4]

[PDF] ohio-state.edu

High performance RDMA-based MPI implementation over InfiniBand

J Liu, J Wu, SP Kini, P Wyckoff, DK Panda - Proceedings of the 17th …, 2003 - dl.acm.org

Although InfiniBand Architecture is relatively new in the high performance computing area, it
offers many features which help us to improve the performance of communication …

저장 인용 643회 인용 관련 학술자료 전체 27개의 버전

[Free GPT-4]

[PDF] uwaterloo.ca

Runahead execution: An alternative to very large instruction windows for out-of-order processors

O Mutlu, J Stark, C Wilkerson… - The Ninth International …, 2003 - ieeexplore.ieee.org

Today's high performance processors tolerate long latency operations by means of out-of-
order execution. However, as latencies increase, the size of the instruction window must …

저장 인용 630회 인용 관련 학술자료 전체 25개의 버전

[Free GPT-4]

[PDF] mit.edu

IMP: Indirect memory prefetcher

X Yu, CJ Hughes, N Satish, S Devadas - Proceedings of the 48th …, 2015 - dl.acm.org

Machine learning, graph analytics and sparse linear algebra-based applications are
dominated by irregular memory accesses resulting from following edges in a graph or non …

저장 인용 198회 인용 관련 학술자료 전체 10개의 버전

[Free GPT-4]

[PDF] academia.edu

Multithreaded processors

T Ungerer, B Robič, J Šilc - The Computer Journal, 2002 - academic.oup.com

The instruction-level parallelism found in a conventional instruction stream is limited. Studies
have shown the limits of processor utilization even for today's superscalar microprocessors …

저장 인용 130회 인용 관련 학술자료 전체 12개의 버전

[Free GPT-4]

[PDF] ed.ac.uk

Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design

N Talati, K May, A Behroozi, Y Yang… - … Symposium on High …, 2021 - ieeexplore.ieee.org

Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …

저장 인용 76회 인용 관련 학술자료 전체 9개의 버전

[Free GPT-4]

[PDF] acm.org

When prefetching works, when it doesn't, and why

J Lee, H Kim, R Vuduc - ACM Transactions on Architecture and Code …, 2012 - dl.acm.org

In emerging and future high-end processor systems, tolerating increasing cache miss
latency and properly managing memory bandwidth will be critical to achieving high …

저장 인용 210회 인용 관련 학술자료 전체 13개의 버전

[Free GPT-4]

[PDF] researchgate.net

Microarchitecture optimizations for exploiting memory-level parallelism

Y Chou, B Fahs, S Abraham - ACM SIGARCH Computer Architecture …, 2004 - dl.acm.org

The performance of memory-bound commercial applicationssuch as databases is limited by
increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism …

저장 인용 289회 인용 관련 학술자료 전체 12개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Tolerating memory latency through software-controlled pre-execution in simultaneous multithreadin...

A survey of recent prefetching techniques for processor caches

Pythia: A customizable hardware prefetching framework using online reinforcement learning

Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation

High performance RDMA-based MPI implementation over InfiniBand

Runahead execution: An alternative to very large instruction windows for out-of-order processors

IMP: Indirect memory prefetcher

Multithreaded processors

Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design

When prefetching works, when it doesn't, and why

Microarchitecture optimizations for exploiting memory-level parallelism