IEEE Solid-States Circuits Magazine - Summer 2020 - 31

and inference of DNNs [5]. (Earlier DNN
benchmarking efforts, including DeepBench [6] and Fathom [7], have now
been subsumed by MLPerf.) The suite
comprises various types of DNNs (e.g.,
CNNs, recurrent neural networks, and
so on) for a variety of tasks, including
image classification, object identification, translation, speech to text,
recommendation, sentiment analysis,
and reinforcement learning.

Throughput and Latency
Throughput is used to indicate the
amount of data that can be processed or
the number of executions of a task that
can be completed in a given time period. High throughput is often critical to
an application. For instance, processing
video at 30 frames/s is often necessary
to deliver real-time performance. For
data analytics, high throughput means
that more data can be analyzed in a
given amount of time. As the amount
of visual data is growing exponentially,
high-throughput big data analytics
becomes increasingly important, particularly if an action needs to be taken
based on the analysis (e.g., security or
terrorist prevention, medical diagnosis,
or drug discovery). Throughput is often
generically reported as the number of
operations per second. In the case of
inference, throughput is reported as inferences per second.
Latency measures the time between
the input data's arrival to a system and
the generation of the result. Low latency is necessary for real-time interactive applications, such as augmented
reality, autonomous navigation, and
robotics. Latency is typically reported
in seconds per inference.
Throughput and latency are often
assumed to be directly derivable from
one another. However, they are actually
quite distinct. A prime example of this
is the well-known approach of batching
input data (e.g., batching multiple images or frames together for processing)
to increase throughput since batching
amortizes overhead such as loading
the weights; however, batching also
increases latency (e.g., at 30 frames/s
and a batch of 100 frames, some
frames will experience at least a 3.3-s

	

delay), which is not acceptable for realtime applications such as high-speed
navigation, where it would reduce the
time available for course correction.
Thus, achieving low latency and
high throughput simultaneously can
sometimes be at odds depending
on the approach, and both metrics
should be reported. The phenomenon
described here can also be understood
using Little's law [8] from queuing
theory, where the average throughput
and average latency are related by the
average number of tasks in flight, as
defined by:
throughput =

tasks-in-flight
.
latency

A DNN-centric version of Little's law
would have throughput measured
in inferences per second, latency
measured in seconds, and inferences in flight (as the tasks-in-flight
equivalent) measured in terms of
the number of images in a batch being processed simultaneously. This
helps to explain why increasing the
number of inferences in flight to
increase throughput may be counterproductive: some techniques
that increase the number of inferences in flight (e.g., batching) also
increase latency.
Several factors affect throughput
and latency. In terms of throughput,
the number of inferences per second
is affected by

	

inferences = operations
second
second

1
#
, (1)
operations
inference

where the number of operations per
second is dictated by both the DNN
hardware and DNN model, while the
number of operations per inference
is dictated by the DNN model.
When considering a system comprising multiple PEs, where a PE corresponds to a simple or primitive
core that performs a single MAC operation, the number of operations per
second can be further decomposed
as follows:

	

operations
second
cycles
1
=
f cycles # second p 
operation
# number of PEs
# utilization of PEs.

(2)

The first term reflects the peak
throughput of a single PE, the second
term reflects the amount of parallelism, and the last term reflects degradation due to the inability of the
architecture to effectively utilize the
PEs. Since the main operation for processing DNNs is a MAC operation, we
use the terms number of operations
and number of MAC operations interchangeably.
One can increase the peak throughput of a single PE by increasing the
number of cycles per second, which
corresponds to a higher clock frequency achieved by reducing the critical
path at the circuit or microarchitectural level; alternatively, one can also
reduce the number of cycles per operation, which can be affected by the
design of the MAC (e.g., a bit-serial,
multicycle MAC would have more cycles per operation).
While these approaches increase
the throughput of a single PE, the
overall throughput can be increased
by increasing the number of PEs and,
thus, the maximum number of MAC
operations that can be performed in
parallel. The number of PEs is dictated by the area of the PE and the
area cost of the system. If the area
cost of the system is fixed, then increasing the number of PEs requires
either reducing the area per PE or
trading off on-chip storage area for
more PEs. Reducing on-chip storage,
however, can affect the utilization of
PEs, which we discuss next.
Reducing the area per PE can also
be achieved by reducing the logic associated with delivering operands to a
MAC. This can be achieved by controlling multiple MAC operations with a
single piece of logic. This is analogous
to the situation in instruction-based
systems, such as CPUs and graphics
processing units (GPUs), that reduce

	 IEEE SOLID-STATE CIRCUITS MAGAZINE	

SU M M E R 2 0 2 0	

31



IEEE Solid-States Circuits Magazine - Summer 2020

Table of Contents for the Digital Edition of IEEE Solid-States Circuits Magazine - Summer 2020

Contents
IEEE Solid-States Circuits Magazine - Summer 2020 - Cover1
IEEE Solid-States Circuits Magazine - Summer 2020 - Cover2
IEEE Solid-States Circuits Magazine - Summer 2020 - Contents
IEEE Solid-States Circuits Magazine - Summer 2020 - 2
IEEE Solid-States Circuits Magazine - Summer 2020 - 3
IEEE Solid-States Circuits Magazine - Summer 2020 - 4
IEEE Solid-States Circuits Magazine - Summer 2020 - 5
IEEE Solid-States Circuits Magazine - Summer 2020 - 6
IEEE Solid-States Circuits Magazine - Summer 2020 - 7
IEEE Solid-States Circuits Magazine - Summer 2020 - 8
IEEE Solid-States Circuits Magazine - Summer 2020 - 9
IEEE Solid-States Circuits Magazine - Summer 2020 - 10
IEEE Solid-States Circuits Magazine - Summer 2020 - 11
IEEE Solid-States Circuits Magazine - Summer 2020 - 12
IEEE Solid-States Circuits Magazine - Summer 2020 - 13
IEEE Solid-States Circuits Magazine - Summer 2020 - 14
IEEE Solid-States Circuits Magazine - Summer 2020 - 15
IEEE Solid-States Circuits Magazine - Summer 2020 - 16
IEEE Solid-States Circuits Magazine - Summer 2020 - 17
IEEE Solid-States Circuits Magazine - Summer 2020 - 18
IEEE Solid-States Circuits Magazine - Summer 2020 - 19
IEEE Solid-States Circuits Magazine - Summer 2020 - 20
IEEE Solid-States Circuits Magazine - Summer 2020 - 21
IEEE Solid-States Circuits Magazine - Summer 2020 - 22
IEEE Solid-States Circuits Magazine - Summer 2020 - 23
IEEE Solid-States Circuits Magazine - Summer 2020 - 24
IEEE Solid-States Circuits Magazine - Summer 2020 - 25
IEEE Solid-States Circuits Magazine - Summer 2020 - 26
IEEE Solid-States Circuits Magazine - Summer 2020 - 27
IEEE Solid-States Circuits Magazine - Summer 2020 - 28
IEEE Solid-States Circuits Magazine - Summer 2020 - 29
IEEE Solid-States Circuits Magazine - Summer 2020 - 30
IEEE Solid-States Circuits Magazine - Summer 2020 - 31
IEEE Solid-States Circuits Magazine - Summer 2020 - 32
IEEE Solid-States Circuits Magazine - Summer 2020 - 33
IEEE Solid-States Circuits Magazine - Summer 2020 - 34
IEEE Solid-States Circuits Magazine - Summer 2020 - 35
IEEE Solid-States Circuits Magazine - Summer 2020 - 36
IEEE Solid-States Circuits Magazine - Summer 2020 - 37
IEEE Solid-States Circuits Magazine - Summer 2020 - 38
IEEE Solid-States Circuits Magazine - Summer 2020 - 39
IEEE Solid-States Circuits Magazine - Summer 2020 - 40
IEEE Solid-States Circuits Magazine - Summer 2020 - 41
IEEE Solid-States Circuits Magazine - Summer 2020 - 42
IEEE Solid-States Circuits Magazine - Summer 2020 - 43
IEEE Solid-States Circuits Magazine - Summer 2020 - 44
IEEE Solid-States Circuits Magazine - Summer 2020 - 45
IEEE Solid-States Circuits Magazine - Summer 2020 - 46
IEEE Solid-States Circuits Magazine - Summer 2020 - 47
IEEE Solid-States Circuits Magazine - Summer 2020 - 48
IEEE Solid-States Circuits Magazine - Summer 2020 - 49
IEEE Solid-States Circuits Magazine - Summer 2020 - 50
IEEE Solid-States Circuits Magazine - Summer 2020 - 51
IEEE Solid-States Circuits Magazine - Summer 2020 - 52
IEEE Solid-States Circuits Magazine - Summer 2020 - 53
IEEE Solid-States Circuits Magazine - Summer 2020 - 54
IEEE Solid-States Circuits Magazine - Summer 2020 - 55
IEEE Solid-States Circuits Magazine - Summer 2020 - 56
IEEE Solid-States Circuits Magazine - Summer 2020 - 57
IEEE Solid-States Circuits Magazine - Summer 2020 - 58
IEEE Solid-States Circuits Magazine - Summer 2020 - 59
IEEE Solid-States Circuits Magazine - Summer 2020 - 60
IEEE Solid-States Circuits Magazine - Summer 2020 - 61
IEEE Solid-States Circuits Magazine - Summer 2020 - 62
IEEE Solid-States Circuits Magazine - Summer 2020 - 63
IEEE Solid-States Circuits Magazine - Summer 2020 - 64
IEEE Solid-States Circuits Magazine - Summer 2020 - 65
IEEE Solid-States Circuits Magazine - Summer 2020 - 66
IEEE Solid-States Circuits Magazine - Summer 2020 - 67
IEEE Solid-States Circuits Magazine - Summer 2020 - 68
IEEE Solid-States Circuits Magazine - Summer 2020 - Cover3
IEEE Solid-States Circuits Magazine - Summer 2020 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2019
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2019
https://www.nxtbook.com/nxtbooks/ieee/mssc_2019summer
https://www.nxtbook.com/nxtbooks/ieee/mssc_2019winter
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018fall
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018summer
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018spring
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018winter
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2014
https://www.nxtbookmedia.com