Abstract: Vector accelerators can efficiently execute regular data-parallel workloads, but they require expensive multi-ported register files to feed large vector ALUs. Recent work on in-situ ...
Abstract: This paper describes a bit-serial MAC accelerator building block architecture that is optimized for low power and cost constrained systems. The architecture is implemented with both a mixed ...
Google has officially launched commercial development kits for its Edge Tensor Processing Unit (TPU) deep-learning accelerator hardware, both in the form of the previously-promised system-on-module ...