IEEE Circuits and Systems Magazine - Q3 2023 - 1

and Systems
Volume 23, Number 3
Third Quarter 2023
8 Tiny Machine Learning: Progress and Futures
Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, and Song Han
Tiny machine learning (TinyML) is a new frontier of machine learning. By squeezing deep learning
models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI
applications and enable ubiquitous intelligence. However, TinyML is challenging due to the hardware
constraints: the tiny memory resource is difficult hold deep learning models designed for cloud and
mobile platforms. There is also limited compiler and inference engine support for bare-metal devices.
Therefore, we need to co-design the algorithm and system stack to enable TinyML. In this review,
we will first discuss the definition, challenges, and applications of TinyML. We then survey the recent
progress in TinyML and deep learning on MCUs. Next, we will introduce MCUNet, showing how we
can achieve ImageNet-scale AI applications on IoT devices with system-algorithm co-design. We will
further extend the solution from inference to training and introduce tiny on-device training techniques.
Finally, we present future directions in this area. Today's " large " model might be tomorrow's " tiny " model.
The scope of TinyML should evolve and adapt over time.
35 Machine Learning Hardware Design for Efficiency,
Flexibility, and Scalability
Jie-Fang Zhang and Zhengya Zhang
The widespread use of deep neural networks (DNNs) and DNN-based machine learning (ML)
methods justifies DNN computation as a workload class itself. Beginning with a brief review of DNN
workloads and computation, we provide an overview of single instruction multiple data (SIMD) and
systolic array architectures. These two basic architectures support the kernel operations for DNN computation,
and they form the core of many flexible DNN accelerators. To enable a higher performance
and efficiency, sparse DNN hardware can be designed to gain from data sparsity. We present common
approaches from compressed storage to processing sparse data to reduce memory and bandwidth
usage and improve energy efficiency and performance. To accommodate the fast evolution of new
models of larger size and higher complexity, modular chiplet integration can be a promising path to
meet the growing needs. We show recent work on homogeneous tiling and heterogeneous integration
to scale up and scale out hardware to support larger models of more complex functions.
54 Challenges in Precision Continuous-Time
Delta-Sigma Data Converter Design
Raviteja Theertham and Shanthi Pavan
We describe challenges encountered in the design of continuous-time delta-sigma modulators that
target high resolution (>16 bits) over wide bandwidths (several hundreds of kHz). The linearity of the
feedback DAC and flicker noise introduced by the loop filter are primary problems that need to be
addressed. We describe two techniques that are inherently more linear than prior-art DACs, namely
the virtual-ground-switched resistor DAC and the zapped virtual-ground-switched dual return-to-open
DAC. Flicker noise can be eliminated by chopping, but one needs to pay careful attention to minimize
chopping artifacts. Example multi-bit and single-bit designs achieving in excess of 100 dB SNDR over
a 250 kHz bandwidth, designed in a 180 nm CMOS technology, are used to illustrate the efficacy of
the techniques described in this article.
Digital Object Identifier 10.1109/MCAS.2023.3306551

