Po-Chi Shih, Kuo-Chan Huang, et al.
CCGrid 2011
Barrier synchronization, an essential mechanism for a block of threads to guard data consistency, is regarded as a threat to performance. This study, however, provides a different viewpoint for barrier synchronization on GPUs: adding barrier synchronization, even when functionally unnecessary, can improve the performance of some memory-intensive applications. We explain this phenomenon using a memory contention model in which artificial barrier synchronization helps reduce memory contention and preserve data access locality. To yield practical applications, we identify a program pattern: artificial barrier synchronization can be used to synchronize the memory accesses when the data locality among threads is violated. Empirical results from three real-world applications demonstrate that artificial barrier synchronization can increase performance by 10 to 20 percent. © 2014 IEEE.
Po-Chi Shih, Kuo-Chan Huang, et al.
CCGrid 2011
Erh-Chung Chen, Pin-Yu Chen, et al.
CVPR 2024
Ying-Chieh Wang, Che-Rung Lee, et al.
IPDPSW 2014
Ying-Chieh Wang, I-Hsin Chung, et al.
HPCC-ICESS-CSS 2015