C.A. Micchelli, W.L. Miranker
Journal of the ACM
The NorthPole Architecture achieves high performance with high efficiency by using local memory within a parallel, distributed core array, linked by networks-on-chip to ensure data availability, orchestrated by prescheduled, distributed local control. A 12nm NorthPole Inference Chip (22B transistors, 795mm2) includes a 256-Core Array with 192MB of distributed SRAM. At nominal 400MHz frequency, it computes TOPS exceeding 200 at 8b-, 400 at 4b-, and 800 at 2b-precision with very high utilization.
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Joxan Jaffar
Journal of the ACM
Cristina Cornelio, Judy Goldsmith, et al.
JAIR