In recent years, convolution neural network (CNN) had been widely used in many image-related machine learning algorithms since its high accuracy for image recognition. As CNN involves an enormous number of computations, it is necessary to accelerate the CNN computation by a hardware accelerator, such as FPGA, GPU and ASIC designs. However, CNN accelerator faces a critical problem: the large time and power consumption caused by the data access of off-chip memory. Here, we describe two methods of CNN accelerator to optimize CNN accelerator, reducing data precision and data-reusing, which can improve the performance of accelerator with the limited on-chip buffer. Three influence factors to data-reusing are proposed and analyzed, including loop execution order, reusing strategy and parallelism strategy. Based on the analysis, we enumerate all legal design possibilities and find out the optimal hardware design with low off-chip memory access and low buffer size. In this way, we can improve the performance and reduce the power consumption of accelerator effectively.
Part of the book: Green Electronics