Swagath Venkataramani, Jungwook Choi, et al.
IEEE Micro
A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Network (DNN). This is especially critical due to trends in complex network topologies and the emergence of eager execution. This work demonstrates that there exists up to a 5.2x performance gap in DL inference to be bridged using SPM management, on a set of image, object and language networks. We propose OnSRAM, a novel SPM management framework integrated with a DL accelerator runtime. OnSRAM has two variants, viz. OnSRAM-Static, which works on static graphs to identify data structures that should be held on-chip based on their properties, and OnSRAM-Eager, which targets an eager execution model (no graph) and uses a speculative scheme to hold/discard data structures. On a prototypical DL accelerator, OnSRAM-Static and OnSRAM-Eager achieve reductions in inference latency (batch size of 1) of 1.02-4.8 x and 1.02-3.1 x, respectively, over a baseline with no SPM management.
Swagath Venkataramani, Jungwook Choi, et al.
IEEE Micro
Younghoon Kim, Swagath Venkataramani, et al.
DATE 2019
Snehasish Kumar, Arrvindh Shriraman, et al.
PACT 2014
Kaoutar El Maghraoui
ISPASS 2021