IEEE Solid-States Circuits Magazine - Summer 2020 - 32
instruction bookkeeping overhead
by using large aggregate instructions
(e.g., single-instruction, multiple-data
(SIMD), vector instructions, single-instruction, multiple-threads (SIMT), or
tensor instructions), where a single instruction can be used to initiate multiple operations.
The number of PEs and the peak
throughput of a single PE indicate
on ly t he t he or et ic a l m a x imu m
throughput (i.e., peak performance)
when all PEs are performing computation (100% utilization). In reality,
the achievable throughput depends
on the actual utilization of those PEs,
which is affected by several factors
as follows:
utilization of PEs
= number of active PEs
number of PEs
# utilization of active PEs. (3)
Performance (Operations/s)
The first term reflects the ability to distribute the workload to PEs,
while the second term reflects how
efficiently those active PEs are processing the workload. The number of
active PEs is the number of PEs that
receive work (the ratio of active PEs
to the total number of PEs can be referred to as the active PE percentage);
therefore, it is desirable to distribute
the workload to as many PEs as possi-
ble. The ability to distribute the workload is determined by the flexibility
of the architecture, for instance, the
on-chip network, to support the different layer shapes in a DNN model as
explored in [9] and [10].
Within the constraints of the onchip network, the number of active
PEs is also determined by the specific
allocation of work to PEs by the mapping process. The mapping process
involves the placement and scheduling in space and time of every MAC
operation (including the delivery of
the appropriate operands) onto the
PEs. The mapper can be thought of
as a compiler for the DNN processor
[11]. The mapping process, on a layerby-layer basis, is explored in detail
in [12]-[14]. Additional challenges regarding the flexibility of mapping are
discussed in the "Energy Efficiency
and Power Consumption" section.
The utilization of active PEs is
largely dictated by the timely delivery of work to the PEs such that the
active PEs do not become idle while
waiting for the data to arrive. This
can be affected by the bandwidth
(BW) and latency of the (on-chip and
off-chip) memory and network. The
BW requirements can be affected by
the amount of data reuse available
in the DNN model and the amount
of data reuse that can be exploited
Slope = BW
Inflection Point
Peak
Performance
BW
Limited
Optimal
Operational
Intensity
Computation
Limited
Operational Intensity
(Operations/Byte)
FIGURE 5: The roofline model. The peak operations per second is indicated by the bold line;
when the operation intensity (dictated by the amount of compute per byte of data) is low,
the operations per second is limited by the data delivery. The design goal is to operate as
closely as possible to the peak operations per second for the operation intensity of a given
workload.
32
SU M M E R 2 0 2 0
IEEE SOLID-STATE CIRCUITS MAGAZINE
by the memory hierarchy and dataflow. The dataflow determines the
order of operations and where data
are stored and reused. The amount
of data reuse can also be increased
using a larger batch size, which is
one of the reasons that increasing
batch size can increase throughput.
The challenges of data delivery and
memory BW are discussed in [14] and
[15]. The utilization of active PEs can
also be affected by the imbalance
of work allocated across PEs, which
may occur when exploiting sparsity
(i.e., avoiding unnecessary work associated with multiplications by
zero); PEs with less work become idle
and, thus, have lower utilization.
There is also an interplay between
the number of PEs and the utilization
of PEs. For instance, one way to reduce
the likelihood that a PE needs to wait
for data is to store some data locally
near or within the PE. However, this
requires increasing the chip area
allocated to on-chip storage, which,
given a fixed chip area, would reduce
the number of PEs. Therefore, a key
design consideration is the area allocation between compute (which
increases the number of PEs) versus
on-chip storage (which increases the
utilization of PEs).
The impact of these factors can be
captured using Eyexam, a systematic
way of understanding the performance limits for DNN processors as
a function of specific characteristics
of the DNN model and processor design. Eyexam includes and extends
the well-known roofline model [16].
The roofline model, as illustrated in
Figure 5, relates average BW demand
and peak computational ability to
performance.
The goal of Eyexam is to provide a
fine-grained performance profile for
a DNN processor. It is a sequential
analysis process that involves seven
major steps, as shown in Figure 6. The
process starts with the assumption that
the architecture has infinite processing
parallelism, storage capacity, and data
BW. Therefore, it has infinite performance (as measured in MAC operations
per cycle).
IEEE Solid-States Circuits Magazine - Summer 2020
Table of Contents for the Digital Edition of IEEE Solid-States Circuits Magazine - Summer 2020
Contents
IEEE Solid-States Circuits Magazine - Summer 2020 - Cover1
IEEE Solid-States Circuits Magazine - Summer 2020 - Cover2
IEEE Solid-States Circuits Magazine - Summer 2020 - Contents
IEEE Solid-States Circuits Magazine - Summer 2020 - 2
IEEE Solid-States Circuits Magazine - Summer 2020 - 3
IEEE Solid-States Circuits Magazine - Summer 2020 - 4
IEEE Solid-States Circuits Magazine - Summer 2020 - 5
IEEE Solid-States Circuits Magazine - Summer 2020 - 6
IEEE Solid-States Circuits Magazine - Summer 2020 - 7
IEEE Solid-States Circuits Magazine - Summer 2020 - 8
IEEE Solid-States Circuits Magazine - Summer 2020 - 9
IEEE Solid-States Circuits Magazine - Summer 2020 - 10
IEEE Solid-States Circuits Magazine - Summer 2020 - 11
IEEE Solid-States Circuits Magazine - Summer 2020 - 12
IEEE Solid-States Circuits Magazine - Summer 2020 - 13
IEEE Solid-States Circuits Magazine - Summer 2020 - 14
IEEE Solid-States Circuits Magazine - Summer 2020 - 15
IEEE Solid-States Circuits Magazine - Summer 2020 - 16
IEEE Solid-States Circuits Magazine - Summer 2020 - 17
IEEE Solid-States Circuits Magazine - Summer 2020 - 18
IEEE Solid-States Circuits Magazine - Summer 2020 - 19
IEEE Solid-States Circuits Magazine - Summer 2020 - 20
IEEE Solid-States Circuits Magazine - Summer 2020 - 21
IEEE Solid-States Circuits Magazine - Summer 2020 - 22
IEEE Solid-States Circuits Magazine - Summer 2020 - 23
IEEE Solid-States Circuits Magazine - Summer 2020 - 24
IEEE Solid-States Circuits Magazine - Summer 2020 - 25
IEEE Solid-States Circuits Magazine - Summer 2020 - 26
IEEE Solid-States Circuits Magazine - Summer 2020 - 27
IEEE Solid-States Circuits Magazine - Summer 2020 - 28
IEEE Solid-States Circuits Magazine - Summer 2020 - 29
IEEE Solid-States Circuits Magazine - Summer 2020 - 30
IEEE Solid-States Circuits Magazine - Summer 2020 - 31
IEEE Solid-States Circuits Magazine - Summer 2020 - 32
IEEE Solid-States Circuits Magazine - Summer 2020 - 33
IEEE Solid-States Circuits Magazine - Summer 2020 - 34
IEEE Solid-States Circuits Magazine - Summer 2020 - 35
IEEE Solid-States Circuits Magazine - Summer 2020 - 36
IEEE Solid-States Circuits Magazine - Summer 2020 - 37
IEEE Solid-States Circuits Magazine - Summer 2020 - 38
IEEE Solid-States Circuits Magazine - Summer 2020 - 39
IEEE Solid-States Circuits Magazine - Summer 2020 - 40
IEEE Solid-States Circuits Magazine - Summer 2020 - 41
IEEE Solid-States Circuits Magazine - Summer 2020 - 42
IEEE Solid-States Circuits Magazine - Summer 2020 - 43
IEEE Solid-States Circuits Magazine - Summer 2020 - 44
IEEE Solid-States Circuits Magazine - Summer 2020 - 45
IEEE Solid-States Circuits Magazine - Summer 2020 - 46
IEEE Solid-States Circuits Magazine - Summer 2020 - 47
IEEE Solid-States Circuits Magazine - Summer 2020 - 48
IEEE Solid-States Circuits Magazine - Summer 2020 - 49
IEEE Solid-States Circuits Magazine - Summer 2020 - 50
IEEE Solid-States Circuits Magazine - Summer 2020 - 51
IEEE Solid-States Circuits Magazine - Summer 2020 - 52
IEEE Solid-States Circuits Magazine - Summer 2020 - 53
IEEE Solid-States Circuits Magazine - Summer 2020 - 54
IEEE Solid-States Circuits Magazine - Summer 2020 - 55
IEEE Solid-States Circuits Magazine - Summer 2020 - 56
IEEE Solid-States Circuits Magazine - Summer 2020 - 57
IEEE Solid-States Circuits Magazine - Summer 2020 - 58
IEEE Solid-States Circuits Magazine - Summer 2020 - 59
IEEE Solid-States Circuits Magazine - Summer 2020 - 60
IEEE Solid-States Circuits Magazine - Summer 2020 - 61
IEEE Solid-States Circuits Magazine - Summer 2020 - 62
IEEE Solid-States Circuits Magazine - Summer 2020 - 63
IEEE Solid-States Circuits Magazine - Summer 2020 - 64
IEEE Solid-States Circuits Magazine - Summer 2020 - 65
IEEE Solid-States Circuits Magazine - Summer 2020 - 66
IEEE Solid-States Circuits Magazine - Summer 2020 - 67
IEEE Solid-States Circuits Magazine - Summer 2020 - 68
IEEE Solid-States Circuits Magazine - Summer 2020 - Cover3
IEEE Solid-States Circuits Magazine - Summer 2020 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2019
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2019
https://www.nxtbook.com/nxtbooks/ieee/mssc_2019summer
https://www.nxtbook.com/nxtbooks/ieee/mssc_2019winter
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018fall
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018summer
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018spring
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018winter
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2014
https://www.nxtbookmedia.com