Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs
V Saravanan, KD Pralhaddas, DP Kothari… - … -centric Computing and …, 2015 - Springer
The power-performance trade-off is one of the major considerations in micro-architecture
design. Pipelined architecture has brought a radical change in the design to capitalize on …
design. Pipelined architecture has brought a radical change in the design to capitalize on …
Efficient warp execution in presence of divergence with collaborative context collection
GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control
flow divergence. On the one hand, it provides a high performance yet power-efficient …
flow divergence. On the one hand, it provides a high performance yet power-efficient …
Generative data models for validation and evaluation of visualization techniques
We argue that there is a need for substantially more research on the use of generative data
models in the validation and evaluation of visualization techniques. For example, user …
models in the validation and evaluation of visualization techniques. For example, user …
Gpu subwarp interleaving
Raytracing applications have naturally high thread divergence, low warp occupancy and are
limited by memory latency. In this paper, we present an architectural enhancement called …
limited by memory latency. In this paper, we present an architectural enhancement called …
Speculative reconvergence for improved SIMT efficiency
GPUs perform most efficiently when all threads in a warp execute the same sequence of
instructions convergently. However, when threads in a warp encounter a divergent branch …
instructions convergently. However, when threads in a warp encounter a divergent branch …
Device and method for scheduling multiple thread groups on SIMD lanes upon divergence in a single thread group
SH ** - US Patent 10,831,490, 2020 - Google Patents
Provided are an apparatus and a method for effectively managing threads diverged by a
conditional branch based on Single Instruction Multiple-based Data (SIMD). The appa ratus …
conditional branch based on Single Instruction Multiple-based Data (SIMD). The appa ratus …
[HTML][HTML] An efficient algorithm for the calculation of sub-grid distances for higher-order LBM boundary conditions in a GPU simulation environment
This paper presents a new and efficient algorithm for the calculation of sub-grid distances in
the context of a lattice Boltzmann method (LBM). LBMs usually operate on equidistant …
the context of a lattice Boltzmann method (LBM). LBMs usually operate on equidistant …
Eliminating intra-warp load imbalance in irregular nested patterns via collaborative task engagement
Nested patterns are one of the most frequently occurring algorithmic themes in GPU
applications where coarse-grained tasks are constituted from a number of fine-grained ones …
applications where coarse-grained tasks are constituted from a number of fine-grained ones …
System, method, and computer program product for managing divergences and synchronization points during thread block execution by using a double sided queue …
O Giroux, GF Diamos - US Patent 9,459,876, 2016 - Google Patents
BACKGROUND Threads (ie, an abstract construct of an instance of a program executing on
a processor) have a basic guarantee of forward progress. In other words, if one thread …
a processor) have a basic guarantee of forward progress. In other words, if one thread …
CUIRRE: An open-source library for load balancing and characterizing irregular applications on GPUs
Abstract While Graphics Processing Units (GPUs) show high performance for problems with
regular structures, they do not perform well for irregular tasks due to the mismatches …
regular structures, they do not perform well for irregular tasks due to the mismatches …