Assisted by in-house AI deep compression toolchain (ezLabel, ezModel, ezQUANT, ezHybrid-M), the proposed technology supports automatic AI model designoptimization with the integrated performance of 120x model size reduction70x power reduction in 2D CNN model,develops a world-first 1/2/4/8-bit CNN model realized by the developed high efficiency Hybrid fixed point CNN NPU (Hybrid-NPU), which has been verified in Xilinx ZCU102 FPGAachieves the performance up to 2.5 TOPS(8-b)/ 20TOPS(1-b)@28nm technology running at 550MHz4TOPS/W energy efficiency.