IEEE Circuits and Systems Magazine - Q3 2023 - 40
parallelism, e.g., VMM and MMM, a high compute utilization
can be achieved. The SIMD architecture also
provides opportunities to reuse weights or inputs by
keeping them stationary at the PE's registers to reduce
memory access. However, the memory size and bandwidth
have to scale with the number of PEs in order to
support the full vector processing.
In general, a 1D SIMD array can be tiled into a 2D SIMD
array, where input and weight loading from the external
memory can be shared between 1D tiles to reduce the
bandwidth requirement while scaling up the total number
of PEs for processing.
The SIMD architecture is used extensively in CPUs
and GPUs [19], [20]. Fig. 5 shows a sub-partition of the
streaming multiprocessor (SM) in Nvidia's A100 GPU
[19]. The SM contains functional units (or CUDA cores)
for arithmetic computation. In each cycle, an instruction
is issued to a set of CUDA cores for parallel execution.
B. Systolic Array Architecture
A systolic array consists of a regular 2D array of PEs
where each PE is connected to its immediate neighbors.
Fig. 6(a) presents the architecture of the systolic array.
The inputs are sent to the PEs through a PE array border,
e.g., leftmost column, and the intermediate results
are propagated across the PEs, e.g., horizontally to the
right and vertically to the bottom. Finally, the output are
sent out through another end of the PE array, e.g., bottom
row.
A systolic array's PE microarchitecture and dataflow
are illustrated in Fig. 6(b). A PE is commonly designed
with a multiplier to compute the product of an incoming
input and a cached weight value, and an adder to
sum the computed product and an incoming partial sum
(psum). The updated psum is sent vertically to the next
PE down and the input is propagated horizontally to the
next PE on the right.
To prepare for an MMM operation on a systolic array,
the weights are first loaded to the array. The weight data
are split into column vectors as shown in Fig. 7(a). Each
vector is streamed to and stored in the corresponding
column of the PE array column, as shown in Fig. 7(b).
The steps of an MMM operation are illustrated in
Fig. 7(c)-(e). The input matrix is split into row vectors
that are streamed sequentially to the PE array, as shown
in Fig. 7(c). The inputs propagate from left to right,
passing through one PE in a clock cycle. When an input
enters a PE, the PE computes the product between
the input and the cached weight, and sums the product
with the psum that enters from top. Following the computation,
the PE passes the input to the next PE on the
right and the updated psum to the next PE down. Note
that the inputs to the rows of PE must be arranged with
a one cycle delay from one row to the next to ensure
that the correct psum accumulation. Data move through
the systolic array in waves. The wavefront propagates
diagonally across the systolic array. The outputs are collected
from the bottom row of PEs as shown in Fig. 7(e).
The computation latency of a HW×
systolic array is
HW
+−1 cycles.
A systolic array allows efficient weight reuse. In a
systolic array, the transfer of psums and inputs are restricted
to efficient movements between neighboring
PEs. Due to weight reuse and efficient data movements
Figure 5. Illustration of an example of SIMD architecture in
Nvidia A100 GPU. Adapted from [19].
40
IEEE CIRCUITS AND SYSTEMS MAGAZINE
Figure 6. Illustration of the (a) systolic array architecture and
(b) PE architecture in the systolic array.
THIRD QUARTER 2023
IEEE Circuits and Systems Magazine - Q3 2023
Table of Contents for the Digital Edition of IEEE Circuits and Systems Magazine - Q3 2023
Contents
IEEE Circuits and Systems Magazine - Q3 2023 - Cover1
IEEE Circuits and Systems Magazine - Q3 2023 - Cover2
IEEE Circuits and Systems Magazine - Q3 2023 - Contents
IEEE Circuits and Systems Magazine - Q3 2023 - 2
IEEE Circuits and Systems Magazine - Q3 2023 - 3
IEEE Circuits and Systems Magazine - Q3 2023 - 4
IEEE Circuits and Systems Magazine - Q3 2023 - 5
IEEE Circuits and Systems Magazine - Q3 2023 - 6
IEEE Circuits and Systems Magazine - Q3 2023 - 7
IEEE Circuits and Systems Magazine - Q3 2023 - 8
IEEE Circuits and Systems Magazine - Q3 2023 - 9
IEEE Circuits and Systems Magazine - Q3 2023 - 10
IEEE Circuits and Systems Magazine - Q3 2023 - 11
IEEE Circuits and Systems Magazine - Q3 2023 - 12
IEEE Circuits and Systems Magazine - Q3 2023 - 13
IEEE Circuits and Systems Magazine - Q3 2023 - 14
IEEE Circuits and Systems Magazine - Q3 2023 - 15
IEEE Circuits and Systems Magazine - Q3 2023 - 16
IEEE Circuits and Systems Magazine - Q3 2023 - 17
IEEE Circuits and Systems Magazine - Q3 2023 - 18
IEEE Circuits and Systems Magazine - Q3 2023 - 19
IEEE Circuits and Systems Magazine - Q3 2023 - 20
IEEE Circuits and Systems Magazine - Q3 2023 - 21
IEEE Circuits and Systems Magazine - Q3 2023 - 22
IEEE Circuits and Systems Magazine - Q3 2023 - 23
IEEE Circuits and Systems Magazine - Q3 2023 - 24
IEEE Circuits and Systems Magazine - Q3 2023 - 25
IEEE Circuits and Systems Magazine - Q3 2023 - 26
IEEE Circuits and Systems Magazine - Q3 2023 - 27
IEEE Circuits and Systems Magazine - Q3 2023 - 28
IEEE Circuits and Systems Magazine - Q3 2023 - 29
IEEE Circuits and Systems Magazine - Q3 2023 - 30
IEEE Circuits and Systems Magazine - Q3 2023 - 31
IEEE Circuits and Systems Magazine - Q3 2023 - 32
IEEE Circuits and Systems Magazine - Q3 2023 - 33
IEEE Circuits and Systems Magazine - Q3 2023 - 34
IEEE Circuits and Systems Magazine - Q3 2023 - 35
IEEE Circuits and Systems Magazine - Q3 2023 - 36
IEEE Circuits and Systems Magazine - Q3 2023 - 37
IEEE Circuits and Systems Magazine - Q3 2023 - 38
IEEE Circuits and Systems Magazine - Q3 2023 - 39
IEEE Circuits and Systems Magazine - Q3 2023 - 40
IEEE Circuits and Systems Magazine - Q3 2023 - 41
IEEE Circuits and Systems Magazine - Q3 2023 - 42
IEEE Circuits and Systems Magazine - Q3 2023 - 43
IEEE Circuits and Systems Magazine - Q3 2023 - 44
IEEE Circuits and Systems Magazine - Q3 2023 - 45
IEEE Circuits and Systems Magazine - Q3 2023 - 46
IEEE Circuits and Systems Magazine - Q3 2023 - 47
IEEE Circuits and Systems Magazine - Q3 2023 - 48
IEEE Circuits and Systems Magazine - Q3 2023 - 49
IEEE Circuits and Systems Magazine - Q3 2023 - 50
IEEE Circuits and Systems Magazine - Q3 2023 - 51
IEEE Circuits and Systems Magazine - Q3 2023 - 52
IEEE Circuits and Systems Magazine - Q3 2023 - 53
IEEE Circuits and Systems Magazine - Q3 2023 - 54
IEEE Circuits and Systems Magazine - Q3 2023 - 55
IEEE Circuits and Systems Magazine - Q3 2023 - 56
IEEE Circuits and Systems Magazine - Q3 2023 - 57
IEEE Circuits and Systems Magazine - Q3 2023 - 58
IEEE Circuits and Systems Magazine - Q3 2023 - 59
IEEE Circuits and Systems Magazine - Q3 2023 - 60
IEEE Circuits and Systems Magazine - Q3 2023 - 61
IEEE Circuits and Systems Magazine - Q3 2023 - 62
IEEE Circuits and Systems Magazine - Q3 2023 - 63
IEEE Circuits and Systems Magazine - Q3 2023 - 64
IEEE Circuits and Systems Magazine - Q3 2023 - 65
IEEE Circuits and Systems Magazine - Q3 2023 - 66
IEEE Circuits and Systems Magazine - Q3 2023 - 67
IEEE Circuits and Systems Magazine - Q3 2023 - 68
IEEE Circuits and Systems Magazine - Q3 2023 - 69
IEEE Circuits and Systems Magazine - Q3 2023 - 70
IEEE Circuits and Systems Magazine - Q3 2023 - 71
IEEE Circuits and Systems Magazine - Q3 2023 - 72
IEEE Circuits and Systems Magazine - Q3 2023 - 73
IEEE Circuits and Systems Magazine - Q3 2023 - 74
IEEE Circuits and Systems Magazine - Q3 2023 - 75
IEEE Circuits and Systems Magazine - Q3 2023 - 76
IEEE Circuits and Systems Magazine - Q3 2023 - 77
IEEE Circuits and Systems Magazine - Q3 2023 - 78
IEEE Circuits and Systems Magazine - Q3 2023 - 79
IEEE Circuits and Systems Magazine - Q3 2023 - 80
IEEE Circuits and Systems Magazine - Q3 2023 - 81
IEEE Circuits and Systems Magazine - Q3 2023 - 82
IEEE Circuits and Systems Magazine - Q3 2023 - 83
IEEE Circuits and Systems Magazine - Q3 2023 - 84
IEEE Circuits and Systems Magazine - Q3 2023 - 85
IEEE Circuits and Systems Magazine - Q3 2023 - 86
IEEE Circuits and Systems Magazine - Q3 2023 - 87
IEEE Circuits and Systems Magazine - Q3 2023 - 88
IEEE Circuits and Systems Magazine - Q3 2023 - Cover3
IEEE Circuits and Systems Magazine - Q3 2023 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2023Q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2023Q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2023Q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2022Q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2022Q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2022Q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2022Q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2021Q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2021q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2021q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2021q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2020q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2020q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2020q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2020q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2019q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2019q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2019q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2019q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2018q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2018q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2018q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2018q1
https://www.nxtbookmedia.com