NeuroSim stands out as a circuit-level macro model for benchmarking CIM architectures based on SRAM and eNVMs, which could estimate circuit-level performance metrics, such as chip area, latency, dynamic energy and leakage power. can only be hold in off-chip DRAM. Our estimation on the CIM-based training architecture (e.g. MINT [87] or CIMAT [52]) show that the performance bottleneck is mainly DRAM access. Despite several thousands of peak TOPS/W (normalized to 1-bit by 1-bit MAC) is expected for on-chip computation for feedforward/backward propagation, overall system's energy efficiency is limited to hundreds of TOPS/W (normalized to 1-bit by 1-bit MAC). The degradation factor is about 3.3×, as it is limited by the expensive DRAM access energy (~4 pJ/bit) for high-performance HBM2 interface [96]. Fig. 9(b) shows the energy breakdown for a 7 nm SRAM based CIM training (in feedforward, error calculation, gradient calculation, weight