Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System OptimizationTian JinSeokin Hong2019ASPLOS 2019
Efficient fork-join on GPUs through warp specializationArpith Chacko JacobAlexandre E. Eichenbergeret al.2017HiPC 2017
Performance analysis and optimization of Clang's OpenMP 4.5 GPU SupportMatt MartineauSimon McIntosh-Smithet al.2016PMBS 2016