Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers
Modern warehouse-scale computers (WSCs) are being outfitted with accelerators to provide
the significant compute required by emerging intelligent personal assistant (IPA) workloads …
the significant compute required by emerging intelligent personal assistant (IPA) workloads …
μlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization
Emerging mobile services heavily utilize Neural Networks (NNs) to improve user
experiences. Such NN-assisted services depend on fast NN execution for high …
experiences. Such NN-assisted services depend on fast NN execution for high …
Understanding co-running behaviors on integrated CPU/GPU architectures
Architecture designers tend to integrate both CPUs and GPUs on the same chip to deliver
energy-efficient designs. It is still an open problem to effectively leverage the advantages of …
energy-efficient designs. It is still an open problem to effectively leverage the advantages of …
Study and evaluation of automatic GPU offloading method from various language applications
Y Yamato - International Journal of Parallel, Emergent and …, 2022 - Taylor & Francis
Heterogeneous hardware other than a small-core central processing unit (CPU) is
increasingly being used, such as a graphics processing unit (GPU), field-programmable …
increasingly being used, such as a graphics processing unit (GPU), field-programmable …
Study and evaluation of improved automatic GPU offloading method
Y Yamato - International Journal of Parallel, Emergent and …, 2021 - Taylor & Francis
With the slowing down of Moore's law, the use of hardware other than CPUs, such as
graphics processing units (GPUs) or field-Programmable gate arrays (FPGAs), is increasing …
graphics processing units (GPUs) or field-Programmable gate arrays (FPGAs), is increasing …
Graphie: Large-scale asynchronous graph traversals on just a GPU
Most GPU-based graph systems cannot handle large-scale graphs that do not fit in the GPU
memory. The ever-increasing graph size demands a scale-up graph system, which can run …
memory. The ever-increasing graph size demands a scale-up graph system, which can run …
Regularized least absolute deviations regression and an efficient algorithm for parameter tuning
L Wang, MD Gordon, J Zhu - Sixth International Conference on …, 2006 - ieeexplore.ieee.org
Linear regression is one of the most important and widely used techniques for data analysis.
However, sometimes people are not satisfied with it because of the following two limitations …
However, sometimes people are not satisfied with it because of the following two limitations …
Adaptive optimization for OpenCL programs on embedded heterogeneous systems
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in
today's embedded systems. These architectures offer potential for energy efficient computing …
today's embedded systems. These architectures offer potential for energy efficient computing …
Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
Heterogeneous systems are the core architecture of most of the high-performance
computing nodes, due to their excellent performance and energy efficiency. However, a key …
computing nodes, due to their excellent performance and energy efficiency. However, a key …
Simplifying programming and load balancing of data parallel applications on heterogeneous systems
Heterogeneous architectures have experienced a great development thanks to their
excellent cost/performance ratio and low power consumption. But heterogeneity significantly …
excellent cost/performance ratio and low power consumption. But heterogeneity significantly …