Table IV. Comparisons with the state-of-the-art MFCC designs for low-power speech keywords recognition. JSSC'20 [8] 65 nm Technology Frequency DNN Structure DNN Bit-width(weight) DNN Bit-width(data) Latency Voltage MFCC Layout Area MFCC Memory Size Numbers of Keywords MFCC Power Consumption Database Recognition Accuracy VLSI'18 [9] 65 nm 250 kHz LSTM+FC 4/8 bits 10 bits 16 ms 0.6 V 0.44 mm2 3.25 KB 10 5.3 nW GSCD 90.87%@Clean 2.5 MHz CONV+FC 1 bit 1 bit 0.5~25 ms 0.57 V 0.17 mm2 4 KB 1 9.2 nW TIDIGIT 95%@Clean 88%@10 dB 85%@5 dB of half-frame pre-emphasis results; 3) An 8-Stage R2SDFFFT structure is used to achieve frequency conversion operation with low-power consumption; 4) For the real and imaginary results of the FFT output, the operation of summing the absolute values directly or using the 16-stage pipelined CORDIC [14] unit to calculate the square root operation can be selected in different modes. The proposed MFCC architecture memory size is 0.953 KB and all intermediate parameters are fixed in three ROMs, including a) the ROM for the Hamming Windowing operation parameters (128 # 10 bits); b) the ROM for the real part butterfly operation parameters (256 # 10 bits); c) the ROM for the imaginary part butterfly operation coefficients (256 # 10 bits); and d) the ROM for the Mel filter parameters (256 # 10 bits). IV. Implementation Results The proposed MFCC is synthesized in TSMC 22 nm ULL CMOS process using high-threshold (HVT) transistors for low leakage power. Table III summarizes the power consumption, area, and other detailed parameters of this module. Fig. 9 shows the current MFCC power consumption compared with the work [12], and a detailed power consumption ratio is presented on the bottom side of Fig. 9. To evaluate the power consumption and recognition accuracy of the proposed MFCC design, the prototype keywords recognition system shown in Fig. 1 is implemented and evaluated on TSMC 22 nm ULL process technology. The prototype system is functional with the logic supply voltage of 0.6 V/0.4 V, the clock FOURTH QUARTER 2021 MFCC Unit BWN Unit ADC and UART Weight Memory TCAS-I'20 [12] 22 nm 250 kHz BWN+FC 1 bit 16 bits 16 ms 0.6 V 0.099 mm2 1.57 KB 10 2.8 nW@LP 5.1 nW@HP GSCD 87.9%@Clean 84.4%@10 dB 80.8%@5 dB This work 22 nm 150 kHz BWN+FC 1 bit 16 bits 16 ms 0.6 V/0.4 V 0.047 mm2 0.953 KB 10 0.728 nW@LP 1.21 nW@HP GSCD 88.6%@Clean 83.4%@10 dB 79.9%@5 dB 85.97%@mix frequency is 150 kHz. The MFCC module operates on the frame of 32 ms with a 16 ms time step at an 8 kHz sample rate. The power consumption is evaluated by Synopsys PTPX at 25 °C TT corner. The layout of the prototype system is shown in Fig. 10. The area of the whole prototype system is 0.1748 mm2. The area of the MFCC macro is 0.047 mm2. The BWN adopted in the prototype for feature classification is trained to classify an input speech into one of the 10 keywords. During the experiment, we mixed the feature parameters extracted under different SNRs into a database to be 500 µm Figure 10. Layout of the speech keywords recognition prototype system with proposed MFCC. IEEE CIRCUITS AND SYSTEMS MAGAZINE 37 Data Memory 500 µm