Gang Liu, Michael Sun, et al.
ICLR 2025
We give an optimal algorithm for broadcasting on SIMD hypercubes without the hardware support for indirect addressing; i.e., an indirect memory reference takes as many cycles as the number of different memory locations (offsets) accessed by all active processors. With communication restricted to one dimension at a time, our algorithm broadcasts m elements in m + n - 1 communication steps and 2m direct memory references in an n-cube. The well-known algorithm based on the spanning binomial tree needs mn communication steps and 2m direct memory references. A previous algorithm based on the n edge-disjoint spanning trees takes m + n - 1 communication steps but needs at least 2 mn direct memory references. We then give a reduction algorithm on SIMD hypercubes, which is optimal in terms of the communication times and memory references and nearly optimal (about a factor of n (n - 1) when m ≫ n) with respect to arithmetic time. We also generalize our algorithms to the case in which communication is restricted to k dimensions at a time, for any integer k in the range of 1 {slanted equal to or less-than} k ≤ n, with their optimality preserved. © 1991.
Gang Liu, Michael Sun, et al.
ICLR 2025
Hong Guan, Saif Masood, et al.
SoCC 2023
Vinayak Gupta, Rajmohan C, et al.
ICON 2022
Fahiem Bacchus, Joseph Y. Halpern, et al.
IJCAI 1995