Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Reducing GPU offload latency via fine-grained CPU-GPU synchronization
GPUs are seeing increasingly widespread use for general purpose computation due to their
excellent performance for highly-parallel, throughput-oriented applications. For many …
excellent performance for highly-parallel, throughput-oriented applications. For many …
The effect of communication and synchronization on Amdahl's law in multicore systems
This work analyses the effects of sequential-to-parallel synchronization and inter-core
communication on multicore performance, speedup and scaling from Amdahl's law …
communication on multicore performance, speedup and scaling from Amdahl's law …
ZoneDefense: A fault-tolerant routing for 2-D meshes without virtual channels
Fault-tolerant routing is usually used to provide reliable on-chip communication for many-
core processors. This paper focuses on a special class of algorithms that do not use virtual …
core processors. This paper focuses on a special class of algorithms that do not use virtual …
DRLAR: A deep reinforcement learning-based adaptive routing framework for network-on-chips
S Wang, X Zhang, C Wang, K Wu, C Li, D Dong - Computer Networks, 2024 - Elsevier
Adaptive routing plays a pivotal role in the overall performance of Network-on-Chips (NoCs).
However, with many-core architectures supporting complex and constantly changing traffic …
However, with many-core architectures supporting complex and constantly changing traffic …
Footprint: Regulating routing adaptiveness in networks-on-chip
B Fu, J Kim - Proceedings of the 44th Annual International …, 2017 - dl.acm.org
Routing algorithms can improve network performance by maximizing routing adaptiveness
but can be problematic in the presence of endpoint congestion. Tree-saturation is a well …
but can be problematic in the presence of endpoint congestion. Tree-saturation is a well …
Shared-resource-centric limited preemptive scheduling: A comprehensive study of suspension-based partitioning approaches
This paper studies the problem of scheduling a set of hard real-time sporadic tasks that may
access CPU cores and a shared resource. Motivated by the observation that the CPU …
access CPU cores and a shared resource. Motivated by the observation that the CPU …
An operating system for safety-critical applications on manycore processors
Processor technology is advancing from bus-based multicores to network-on-chip-based
many cores, posing new challenges for operating system design. In this paper, we discuss …
many cores, posing new challenges for operating system design. In this paper, we discuss …
Extendable pattern-oriented optimization directives
Algorithm-specific, that is, semantic-specific optimizations have been observed to bring
significant performance gains, especially for a diverse set of multi/many-core architectures …
significant performance gains, especially for a diverse set of multi/many-core architectures …
Godson-T: An efficient many-core processor exploring thread-level parallelism
Godson-T is a research many-core processor designed for parallel scientific computing that
delivers efficient performance and flexible programmability simultaneously. It also has many …
delivers efficient performance and flexible programmability simultaneously. It also has many …
MT-DMA: A DMA controller supporting efficient matrix transposition for digital signal processing
S Ma, Y Lei, L Huang, Z Wang - IEEE Access, 2018 - ieeexplore.ieee.org
Matrix transposition plays a critical role in digital signal processing. However, the existing
matrix transposition implementations have significant limitations. A traditional design uses …
matrix transposition implementations have significant limitations. A traditional design uses …