S. R. Nandakumar, Irem Boybat, et al.
IEDM 2020
In-memory computing is an emerging computing paradigm enabling deep-learning inference at significantly higher energy-efficiency and reduced latency. The essential idea is mapping the synaptic weights of each layer to one or more in-memory computing (IMC) cores. During inference, these cores perform the associated matrix-vector multiplications in place with O(1) time complexity, obviating the need to move the synaptic weights to additional processing units. Moreover, this architecture enables the execution of these networks in a highly pipelined fashion. However, a key challenge is designing an efficient communication fabric for the IMC cores. In this work, we present one such communication fabric based on a graph topology that is well-suited for the widely successful convolutional neural networks (CNNs). We show that this communication fabric facilitates the pipelined execution of all state-of-the-art CNNs by proving the existence of a homomorphism between the graph representations of these networks and that corresponding to the proposed communication fabric. We then present a quantitative comparison with established communication topologies and show that our proposed topology achieves the lowest bandwidth requirements per communication channel. Finally, we present one hardware implementation and show a concrete example of mapping ResNet-32 onto an IMC core array interconnected via the proposed communication fabric.
S. R. Nandakumar, Irem Boybat, et al.
IEDM 2020
Haralampos Pozidis, Thomas Mittelholzer, et al.
IEEE Transactions on Magnetics
Abu Sebastian, Daniel Krebs, et al.
IRPS 2015
Manuel Le Gallo, Abu Sebastian, et al.
DRC 2019