IEEE Solid-State Circuits Magazine - Spring 2016 - 61

device. One popular option is to
never allow this, requiring the application on the host processor socket
to marshall any necessary data before
spawning an NDP task (as was done
by Pugsley et al. [28]). Another option
is to simply treat every core as being
part of a full-fledged shared-memory
multiprocessor system, i.e., every
core can issue loads and stores to any
globally visible address regardless of
whether the core resides on the host
processor or on the memory device.
This entails more software/hardware
complexity because it requires the
memory device to maintain a coherent translation look-aside buffer
(TLB) and serve as an originator of
memory requests.
This brings us to yet another key
and somewhat unsolved issue-how
is virtual memory handled? One solution, as suggested in the PEI work [2],
is to leave virtual memory management entirely up to the host processor. When a task is spawned on the
memory device, it is provided the
necessary arguments as physical
addresses; the task is not allowed to
touch data beyond the cache lines (or
pages) that were provided as arguments. Another solution, as suggested
by Pugsley et al. [28], is to organize
the data on a memory device into a
few large pages. This is a good fit
for many big data applications, and
it reduces the overheads associated
with page faults, large TLBs, etc.

What Microarchitecture Is Best
for Near Data Processing?
In most prior work, the cores on the
memory device have been designed
to be "wimpy." While some proposals incorporate full-fledged generalpurpose wimpy cores [28], such as
the 80 mW ARM Cortex A5 cores in
NDC [28] that can execute entire general-purpose map or reduce tasks,
others only implement custom functional units or accelerators [2], [3],
[15]. In the work of Ahn et al. [2],
the functional unit is only capable
of executing a single PIM-enabled
instruction, and the most complex
functional unit handles dot-product

computations for the words in a
cache line. In another example, Akin
et al. [3] design a 178-mW functional
unit that can permute data.
In addition to these fixed-function
units, we have also seen examples of
reconfigurable accelerators, such as
the use of CGRAs by Farmahini-Farahani et al. [10] and predefined accelerator primitives that can be chained
together to perform more complex
operations [15]. Also, there are proposals to combine general-purpose
wimpy cores and accelerators, e.g.,
for in-memory MapReduce workloads,
Pugsley et al. [26] execute map and
reduce phases on Arm cores, while the
sort phase between map and reduce
is handled by a fixed-function accelerator. These are all compelling design
points on the classic generality versus
efficiency spectrum.
So why is it best to pursue a
"wimpy" core instead of a low-latency
out-of-order core? An argument that
is frequently cited is the reluctance
to embed a high-power core in a
3D-stacked package for fear that it
may lead to thermal issues. But this
is often a red herring. For example,
adding a few watts to a 13-W HMC
device [18] is unlikely to pose a hazard, especially if some of the external
bandwidth can be eliminated [28]. A
more detailed study by Eckert et al.
[9] makes exactly that argument.
The more credible argument in
favor of wimpy cores is that it actually leads to higher performance as
it enables the creation of a throughput-optimized compute substrate
that can leverage the high bandwidth
afforded by NDP. As mentioned earlier, one of the main benefits of NDP
is that plugging in more memory
modules leads to more cores and
a large-scale parallel system. This
is most useful for tasks with high
degrees of parallelism. For such a
highly parallel task, the path to high
performance at a fixed power budget is to use many low-power cores,
not a few high-power cores [28]. To
be more precise, for a highly parallel
task, we can optimize throughput at
a fixed power budget by optimizing

energy per instruction [28]. Therefore, it is best to use cores or accelerators that are optimized for low
energy and not low latency. This also
enables the use of many cores or
accelerators per memory device, an
important requirement if we want to
saturate the available bandwidth.

Where Can Processors/Accelerators
Be Placed?
About two decades ago, there was
a strong push to place computation on the memory die itself. With
a potentially lower focus on cost
per bit in the future, that approach
may yet have merit. But so far, few
have chosen to revisit that direction.
A few works by Seshadri et al. [33]-
[35] have proposed small changes to
DRAM arrays to support bit manipulations and efficient data movement.
The vast majority of NDP studies in the last few years have focused
on 3D-stacked memory devices. This
approach leaves the DRAM dies relatively untouched, while leveraging
TSVs to support very high intrapackage bandwidth. By localizing the cores/
accelerators to a separate die, they can
be implemented in a superior logic
process. This approach is often touted
as the solution that offers the benefits
of NDP at relatively low cost, and that
is compatible with the natural evolution of DRAMs (3D stacking). However,
early indications are that 3D-stacked
DRAMs, especially those that include
a logic die, will not be cheap. In certain segments, the cost increase will be
well worth the higher performance.
Given the high cost of 3D-stacked
DRAM, it is worth exploring if some
(most?) of the benefits of NDP can also
be provided with conventional non3D-stacked DRAM? This is an area that
is relatively under studied and more
research needs to be done. One example proposal by Pugsley et al. [27],
NDC-Module, re-designs a DIMM by
placing many simple processor chips
on the DIMM and connecting them
to their adjacent commodity DRAM
chips. The key here is that in a conventional DIMM and server, the on-DIMM
buses can offer very high bandwidth

IEEE SOLID-STATE CIRCUITS MAGAZINE

S P R I N G 2 0 16

61



Table of Contents for the Digital Edition of IEEE Solid-State Circuits Magazine - Spring 2016

IEEE Solid-State Circuits Magazine - Spring 2016 - Cover1
IEEE Solid-State Circuits Magazine - Spring 2016 - Cover2
IEEE Solid-State Circuits Magazine - Spring 2016 - 1
IEEE Solid-State Circuits Magazine - Spring 2016 - 2
IEEE Solid-State Circuits Magazine - Spring 2016 - 3
IEEE Solid-State Circuits Magazine - Spring 2016 - 4
IEEE Solid-State Circuits Magazine - Spring 2016 - 5
IEEE Solid-State Circuits Magazine - Spring 2016 - 6
IEEE Solid-State Circuits Magazine - Spring 2016 - 7
IEEE Solid-State Circuits Magazine - Spring 2016 - 8
IEEE Solid-State Circuits Magazine - Spring 2016 - 9
IEEE Solid-State Circuits Magazine - Spring 2016 - 10
IEEE Solid-State Circuits Magazine - Spring 2016 - 11
IEEE Solid-State Circuits Magazine - Spring 2016 - 12
IEEE Solid-State Circuits Magazine - Spring 2016 - 13
IEEE Solid-State Circuits Magazine - Spring 2016 - 14
IEEE Solid-State Circuits Magazine - Spring 2016 - 15
IEEE Solid-State Circuits Magazine - Spring 2016 - 16
IEEE Solid-State Circuits Magazine - Spring 2016 - 17
IEEE Solid-State Circuits Magazine - Spring 2016 - 18
IEEE Solid-State Circuits Magazine - Spring 2016 - 19
IEEE Solid-State Circuits Magazine - Spring 2016 - 20
IEEE Solid-State Circuits Magazine - Spring 2016 - 21
IEEE Solid-State Circuits Magazine - Spring 2016 - 22
IEEE Solid-State Circuits Magazine - Spring 2016 - 23
IEEE Solid-State Circuits Magazine - Spring 2016 - 24
IEEE Solid-State Circuits Magazine - Spring 2016 - 25
IEEE Solid-State Circuits Magazine - Spring 2016 - 26
IEEE Solid-State Circuits Magazine - Spring 2016 - 27
IEEE Solid-State Circuits Magazine - Spring 2016 - 28
IEEE Solid-State Circuits Magazine - Spring 2016 - 29
IEEE Solid-State Circuits Magazine - Spring 2016 - 30
IEEE Solid-State Circuits Magazine - Spring 2016 - 31
IEEE Solid-State Circuits Magazine - Spring 2016 - 32
IEEE Solid-State Circuits Magazine - Spring 2016 - 33
IEEE Solid-State Circuits Magazine - Spring 2016 - 34
IEEE Solid-State Circuits Magazine - Spring 2016 - 35
IEEE Solid-State Circuits Magazine - Spring 2016 - 36
IEEE Solid-State Circuits Magazine - Spring 2016 - 37
IEEE Solid-State Circuits Magazine - Spring 2016 - 38
IEEE Solid-State Circuits Magazine - Spring 2016 - 39
IEEE Solid-State Circuits Magazine - Spring 2016 - 40
IEEE Solid-State Circuits Magazine - Spring 2016 - 41
IEEE Solid-State Circuits Magazine - Spring 2016 - 42
IEEE Solid-State Circuits Magazine - Spring 2016 - 43
IEEE Solid-State Circuits Magazine - Spring 2016 - 44
IEEE Solid-State Circuits Magazine - Spring 2016 - 45
IEEE Solid-State Circuits Magazine - Spring 2016 - 46
IEEE Solid-State Circuits Magazine - Spring 2016 - 47
IEEE Solid-State Circuits Magazine - Spring 2016 - 48
IEEE Solid-State Circuits Magazine - Spring 2016 - 49
IEEE Solid-State Circuits Magazine - Spring 2016 - 50
IEEE Solid-State Circuits Magazine - Spring 2016 - 51
IEEE Solid-State Circuits Magazine - Spring 2016 - 52
IEEE Solid-State Circuits Magazine - Spring 2016 - 53
IEEE Solid-State Circuits Magazine - Spring 2016 - 54
IEEE Solid-State Circuits Magazine - Spring 2016 - 55
IEEE Solid-State Circuits Magazine - Spring 2016 - 56
IEEE Solid-State Circuits Magazine - Spring 2016 - 57
IEEE Solid-State Circuits Magazine - Spring 2016 - 58
IEEE Solid-State Circuits Magazine - Spring 2016 - 59
IEEE Solid-State Circuits Magazine - Spring 2016 - 60
IEEE Solid-State Circuits Magazine - Spring 2016 - 61
IEEE Solid-State Circuits Magazine - Spring 2016 - 62
IEEE Solid-State Circuits Magazine - Spring 2016 - 63
IEEE Solid-State Circuits Magazine - Spring 2016 - 64
IEEE Solid-State Circuits Magazine - Spring 2016 - 65
IEEE Solid-State Circuits Magazine - Spring 2016 - 66
IEEE Solid-State Circuits Magazine - Spring 2016 - 67
IEEE Solid-State Circuits Magazine - Spring 2016 - 68
IEEE Solid-State Circuits Magazine - Spring 2016 - 69
IEEE Solid-State Circuits Magazine - Spring 2016 - 70
IEEE Solid-State Circuits Magazine - Spring 2016 - 71
IEEE Solid-State Circuits Magazine - Spring 2016 - 72
IEEE Solid-State Circuits Magazine - Spring 2016 - 73
IEEE Solid-State Circuits Magazine - Spring 2016 - 74
IEEE Solid-State Circuits Magazine - Spring 2016 - 75
IEEE Solid-State Circuits Magazine - Spring 2016 - 76
IEEE Solid-State Circuits Magazine - Spring 2016 - 77
IEEE Solid-State Circuits Magazine - Spring 2016 - 78
IEEE Solid-State Circuits Magazine - Spring 2016 - 79
IEEE Solid-State Circuits Magazine - Spring 2016 - 80
IEEE Solid-State Circuits Magazine - Spring 2016 - 81
IEEE Solid-State Circuits Magazine - Spring 2016 - 82
IEEE Solid-State Circuits Magazine - Spring 2016 - 83
IEEE Solid-State Circuits Magazine - Spring 2016 - 84
IEEE Solid-State Circuits Magazine - Spring 2016 - 85
IEEE Solid-State Circuits Magazine - Spring 2016 - 86
IEEE Solid-State Circuits Magazine - Spring 2016 - 87
IEEE Solid-State Circuits Magazine - Spring 2016 - 88
IEEE Solid-State Circuits Magazine - Spring 2016 - 89
IEEE Solid-State Circuits Magazine - Spring 2016 - 90
IEEE Solid-State Circuits Magazine - Spring 2016 - 91
IEEE Solid-State Circuits Magazine - Spring 2016 - 92
IEEE Solid-State Circuits Magazine - Spring 2016 - 93
IEEE Solid-State Circuits Magazine - Spring 2016 - 94
IEEE Solid-State Circuits Magazine - Spring 2016 - 95
IEEE Solid-State Circuits Magazine - Spring 2016 - 96
IEEE Solid-State Circuits Magazine - Spring 2016 - 97
IEEE Solid-State Circuits Magazine - Spring 2016 - 98
IEEE Solid-State Circuits Magazine - Spring 2016 - 99
IEEE Solid-State Circuits Magazine - Spring 2016 - 100
IEEE Solid-State Circuits Magazine - Spring 2016 - 101
IEEE Solid-State Circuits Magazine - Spring 2016 - 102
IEEE Solid-State Circuits Magazine - Spring 2016 - 103
IEEE Solid-State Circuits Magazine - Spring 2016 - 104
IEEE Solid-State Circuits Magazine - Spring 2016 - 105
IEEE Solid-State Circuits Magazine - Spring 2016 - 106
IEEE Solid-State Circuits Magazine - Spring 2016 - 107
IEEE Solid-State Circuits Magazine - Spring 2016 - 108
IEEE Solid-State Circuits Magazine - Spring 2016 - 109
IEEE Solid-State Circuits Magazine - Spring 2016 - 110
IEEE Solid-State Circuits Magazine - Spring 2016 - 111
IEEE Solid-State Circuits Magazine - Spring 2016 - 112
IEEE Solid-State Circuits Magazine - Spring 2016 - 113
IEEE Solid-State Circuits Magazine - Spring 2016 - 114
IEEE Solid-State Circuits Magazine - Spring 2016 - 115
IEEE Solid-State Circuits Magazine - Spring 2016 - 116
IEEE Solid-State Circuits Magazine - Spring 2016 - 117
IEEE Solid-State Circuits Magazine - Spring 2016 - 118
IEEE Solid-State Circuits Magazine - Spring 2016 - 119
IEEE Solid-State Circuits Magazine - Spring 2016 - 120
IEEE Solid-State Circuits Magazine - Spring 2016 - 121
IEEE Solid-State Circuits Magazine - Spring 2016 - 122
IEEE Solid-State Circuits Magazine - Spring 2016 - 123
IEEE Solid-State Circuits Magazine - Spring 2016 - 124
IEEE Solid-State Circuits Magazine - Spring 2016 - Cover3
IEEE Solid-State Circuits Magazine - Spring 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2019
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2019
https://www.nxtbook.com/nxtbooks/ieee/mssc_2019summer
https://www.nxtbook.com/nxtbooks/ieee/mssc_2019winter
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018fall
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018summer
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018spring
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018winter
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2014
https://www.nxtbookmedia.com