IEEE Solid-State Circuits Magazine - Spring 2016 - 60

computation model, with cores having asymmetric views of memory.
A core on a memory device has a
high-bandwidth link (A) to its local
memory and a low bandwidth link
(C) to nonlocal memory. A core on
the processor socket has a collection of low-bandwidth links (B) to
an aggregation of memory devices.
In terms of bandwidth, A > B > C.
If a computation can localize its
memory accesses to data in a single
memory device, the computation is
best placed on that memory device
so it can exploit the high bandwidth
of A. If a computation has limited
locality, it must determine if it is
better to off-load to the memory
device and exploit a combination of
A and C or remain on the processor
socket and exploit B.
This is clearly a simplistic view
because it does not consider detailed
network topologies nor the quality of the
cores/caches on the processor/memory
devices. But at a high level, it conveys
the key point that a computation with
localized memory accesses is an ideal
candidate for near-data execution.
A second key component in this
analysis is parallelism. Each additional memory device adds more
memory and compute resources. If
a computation can be parallelized
across the many cores in memory
devices, it is an even better candidate for near-data execution.
There are a few other considerations in the off-load decision.
■ Is one of the cores or one of the
cache hierarchies especially beneficial for the computation at hand?
■ What is the cost of spawning a
task (passing code and arguments
to the memory device)?
■ What is the cost of terminating a
task (returning results to the processor socket)?
■ What is the length of the off-loaded function?
It is nontrivial to factor in these issues
to develop an automatic hardware/software off-load policy. It is, therefore,
an active area of research. We briefly
describe two examples here that represent opposite ends of the spectrum.

60

S P R I N G 2 0 16

The work of Ahn et al. [2] attempts
fine-granularity off-loads with PIMenabled instructions (PEIs). Individual
instructions can be executed either
on the host processor or on the memory device (with a locality monitor
that helps in this decision making).
Consider the example where a single
scalar value is being added to some
word that is not currently cache resident. Without PEIs, an entire 64-B
cache line is brought to the processor, an update is performed, and the
entire 64-B cache line is sent back
to memory. With PEIs, the 8-B scalar
value is sent to the memory device
(with appropriate control bits) so the
update can be performed directly in
the memory device.
In this particular example, PEIs
yield a 16× decrease in bandwidth
requirement, the length of the offloaded function is one, and the cost
of task spawning and termination is
actually lower with PEIs than without PEIs. Ahn et al. also extend the
instruction set architecture (ISA) so
that nontrivial functionalities can
be off-loaded to memory devices.
Meanwhile, work from our group
[28] focuses on in-memory MapReduce applications that exhibit a very
high degree of locality and task-level
parallelism. Each map and reduce
task is executed on a memory device
that contains that data partition,
dubbed near data computing (NDC).
Task setup and teardown are nontrivial efforts, especially if data shuffling is required. But that overhead is
palatable because each task executes
for many thousands of cycles. The
task latency itself is highly sensitive
to memory bandwidth. The primary
source of speedup is the high bandwidth within a collection of memory
devices, which is far greater than the
bandwidth into the processor socket.
The PEI approach can yield a nearly
1.5× average speedup for a range of
memory-intensive workloads that do
not exhibit cache line reuse, while the
NDC approach can yield up to 15×
speedup for a specific class of memoryintensive workloads that have high
coarse-grained parallelism. This also

IEEE SOLID-STATE CIRCUITS MAGAZINE

provides insight on the data access patterns of workloads that benefit from
NDP. In-memory MapReduce (e.g., in
SPARK [46]) is a killer app that exhibits
localized memory access and embarrassing levels of coarse-grained parallelism [28]. In-memory MapReduce
frameworks have been shown to be
useful for a wide range of applications:
database operations, analytics, machine
learning, and graph algorithms [46].
The PEI work shows benefits for
a number of graph workloads where
data traversal is random enough that
caches are ineffective, and small computations within each graph vertex
can be off-loaded to memory. They
also extend the ISA to perform hash
table probing, histogram bin indexing,
and dot products over cache lines to
accelerate data mining and machine
learning applications. Other papers
have also shown NDP benefits for
other applications, e.g., graph processing [1], scientific kernels that map to
coarse-grained reconfigurable accelerators (CGRAs) [10], scientific workloads [21], signal processing [15], and
join algorithms [20].

How Should Data Be Organized
Across Memory Devices?
A natural next issue is the interleaving and addressing of data across
several memory devices. Unlike conventional double-data rate (DDR)
memory that stripes a single cache
line across multiple DRAM packages,
an entire cache line or even an entire
page in NDP must now be localized to
a single memory package. This allows
the core/accelerator on the memory
package to perform fine- and coarsegrained computation without engaging in complex bit-level manipulations
and without aggregating inputs from
many sources. This already appears to
be the default data mapping in emerging memory devices like the hybrid
memory cube (HMC) [18]. However,
when the same data is accessed by a
host processor socket, it may lead to
longer transfer times.
It is also important to resolve how
a memory device may potentially
access data in a different memory



Table of Contents for the Digital Edition of IEEE Solid-State Circuits Magazine - Spring 2016

IEEE Solid-State Circuits Magazine - Spring 2016 - Cover1
IEEE Solid-State Circuits Magazine - Spring 2016 - Cover2
IEEE Solid-State Circuits Magazine - Spring 2016 - 1
IEEE Solid-State Circuits Magazine - Spring 2016 - 2
IEEE Solid-State Circuits Magazine - Spring 2016 - 3
IEEE Solid-State Circuits Magazine - Spring 2016 - 4
IEEE Solid-State Circuits Magazine - Spring 2016 - 5
IEEE Solid-State Circuits Magazine - Spring 2016 - 6
IEEE Solid-State Circuits Magazine - Spring 2016 - 7
IEEE Solid-State Circuits Magazine - Spring 2016 - 8
IEEE Solid-State Circuits Magazine - Spring 2016 - 9
IEEE Solid-State Circuits Magazine - Spring 2016 - 10
IEEE Solid-State Circuits Magazine - Spring 2016 - 11
IEEE Solid-State Circuits Magazine - Spring 2016 - 12
IEEE Solid-State Circuits Magazine - Spring 2016 - 13
IEEE Solid-State Circuits Magazine - Spring 2016 - 14
IEEE Solid-State Circuits Magazine - Spring 2016 - 15
IEEE Solid-State Circuits Magazine - Spring 2016 - 16
IEEE Solid-State Circuits Magazine - Spring 2016 - 17
IEEE Solid-State Circuits Magazine - Spring 2016 - 18
IEEE Solid-State Circuits Magazine - Spring 2016 - 19
IEEE Solid-State Circuits Magazine - Spring 2016 - 20
IEEE Solid-State Circuits Magazine - Spring 2016 - 21
IEEE Solid-State Circuits Magazine - Spring 2016 - 22
IEEE Solid-State Circuits Magazine - Spring 2016 - 23
IEEE Solid-State Circuits Magazine - Spring 2016 - 24
IEEE Solid-State Circuits Magazine - Spring 2016 - 25
IEEE Solid-State Circuits Magazine - Spring 2016 - 26
IEEE Solid-State Circuits Magazine - Spring 2016 - 27
IEEE Solid-State Circuits Magazine - Spring 2016 - 28
IEEE Solid-State Circuits Magazine - Spring 2016 - 29
IEEE Solid-State Circuits Magazine - Spring 2016 - 30
IEEE Solid-State Circuits Magazine - Spring 2016 - 31
IEEE Solid-State Circuits Magazine - Spring 2016 - 32
IEEE Solid-State Circuits Magazine - Spring 2016 - 33
IEEE Solid-State Circuits Magazine - Spring 2016 - 34
IEEE Solid-State Circuits Magazine - Spring 2016 - 35
IEEE Solid-State Circuits Magazine - Spring 2016 - 36
IEEE Solid-State Circuits Magazine - Spring 2016 - 37
IEEE Solid-State Circuits Magazine - Spring 2016 - 38
IEEE Solid-State Circuits Magazine - Spring 2016 - 39
IEEE Solid-State Circuits Magazine - Spring 2016 - 40
IEEE Solid-State Circuits Magazine - Spring 2016 - 41
IEEE Solid-State Circuits Magazine - Spring 2016 - 42
IEEE Solid-State Circuits Magazine - Spring 2016 - 43
IEEE Solid-State Circuits Magazine - Spring 2016 - 44
IEEE Solid-State Circuits Magazine - Spring 2016 - 45
IEEE Solid-State Circuits Magazine - Spring 2016 - 46
IEEE Solid-State Circuits Magazine - Spring 2016 - 47
IEEE Solid-State Circuits Magazine - Spring 2016 - 48
IEEE Solid-State Circuits Magazine - Spring 2016 - 49
IEEE Solid-State Circuits Magazine - Spring 2016 - 50
IEEE Solid-State Circuits Magazine - Spring 2016 - 51
IEEE Solid-State Circuits Magazine - Spring 2016 - 52
IEEE Solid-State Circuits Magazine - Spring 2016 - 53
IEEE Solid-State Circuits Magazine - Spring 2016 - 54
IEEE Solid-State Circuits Magazine - Spring 2016 - 55
IEEE Solid-State Circuits Magazine - Spring 2016 - 56
IEEE Solid-State Circuits Magazine - Spring 2016 - 57
IEEE Solid-State Circuits Magazine - Spring 2016 - 58
IEEE Solid-State Circuits Magazine - Spring 2016 - 59
IEEE Solid-State Circuits Magazine - Spring 2016 - 60
IEEE Solid-State Circuits Magazine - Spring 2016 - 61
IEEE Solid-State Circuits Magazine - Spring 2016 - 62
IEEE Solid-State Circuits Magazine - Spring 2016 - 63
IEEE Solid-State Circuits Magazine - Spring 2016 - 64
IEEE Solid-State Circuits Magazine - Spring 2016 - 65
IEEE Solid-State Circuits Magazine - Spring 2016 - 66
IEEE Solid-State Circuits Magazine - Spring 2016 - 67
IEEE Solid-State Circuits Magazine - Spring 2016 - 68
IEEE Solid-State Circuits Magazine - Spring 2016 - 69
IEEE Solid-State Circuits Magazine - Spring 2016 - 70
IEEE Solid-State Circuits Magazine - Spring 2016 - 71
IEEE Solid-State Circuits Magazine - Spring 2016 - 72
IEEE Solid-State Circuits Magazine - Spring 2016 - 73
IEEE Solid-State Circuits Magazine - Spring 2016 - 74
IEEE Solid-State Circuits Magazine - Spring 2016 - 75
IEEE Solid-State Circuits Magazine - Spring 2016 - 76
IEEE Solid-State Circuits Magazine - Spring 2016 - 77
IEEE Solid-State Circuits Magazine - Spring 2016 - 78
IEEE Solid-State Circuits Magazine - Spring 2016 - 79
IEEE Solid-State Circuits Magazine - Spring 2016 - 80
IEEE Solid-State Circuits Magazine - Spring 2016 - 81
IEEE Solid-State Circuits Magazine - Spring 2016 - 82
IEEE Solid-State Circuits Magazine - Spring 2016 - 83
IEEE Solid-State Circuits Magazine - Spring 2016 - 84
IEEE Solid-State Circuits Magazine - Spring 2016 - 85
IEEE Solid-State Circuits Magazine - Spring 2016 - 86
IEEE Solid-State Circuits Magazine - Spring 2016 - 87
IEEE Solid-State Circuits Magazine - Spring 2016 - 88
IEEE Solid-State Circuits Magazine - Spring 2016 - 89
IEEE Solid-State Circuits Magazine - Spring 2016 - 90
IEEE Solid-State Circuits Magazine - Spring 2016 - 91
IEEE Solid-State Circuits Magazine - Spring 2016 - 92
IEEE Solid-State Circuits Magazine - Spring 2016 - 93
IEEE Solid-State Circuits Magazine - Spring 2016 - 94
IEEE Solid-State Circuits Magazine - Spring 2016 - 95
IEEE Solid-State Circuits Magazine - Spring 2016 - 96
IEEE Solid-State Circuits Magazine - Spring 2016 - 97
IEEE Solid-State Circuits Magazine - Spring 2016 - 98
IEEE Solid-State Circuits Magazine - Spring 2016 - 99
IEEE Solid-State Circuits Magazine - Spring 2016 - 100
IEEE Solid-State Circuits Magazine - Spring 2016 - 101
IEEE Solid-State Circuits Magazine - Spring 2016 - 102
IEEE Solid-State Circuits Magazine - Spring 2016 - 103
IEEE Solid-State Circuits Magazine - Spring 2016 - 104
IEEE Solid-State Circuits Magazine - Spring 2016 - 105
IEEE Solid-State Circuits Magazine - Spring 2016 - 106
IEEE Solid-State Circuits Magazine - Spring 2016 - 107
IEEE Solid-State Circuits Magazine - Spring 2016 - 108
IEEE Solid-State Circuits Magazine - Spring 2016 - 109
IEEE Solid-State Circuits Magazine - Spring 2016 - 110
IEEE Solid-State Circuits Magazine - Spring 2016 - 111
IEEE Solid-State Circuits Magazine - Spring 2016 - 112
IEEE Solid-State Circuits Magazine - Spring 2016 - 113
IEEE Solid-State Circuits Magazine - Spring 2016 - 114
IEEE Solid-State Circuits Magazine - Spring 2016 - 115
IEEE Solid-State Circuits Magazine - Spring 2016 - 116
IEEE Solid-State Circuits Magazine - Spring 2016 - 117
IEEE Solid-State Circuits Magazine - Spring 2016 - 118
IEEE Solid-State Circuits Magazine - Spring 2016 - 119
IEEE Solid-State Circuits Magazine - Spring 2016 - 120
IEEE Solid-State Circuits Magazine - Spring 2016 - 121
IEEE Solid-State Circuits Magazine - Spring 2016 - 122
IEEE Solid-State Circuits Magazine - Spring 2016 - 123
IEEE Solid-State Circuits Magazine - Spring 2016 - 124
IEEE Solid-State Circuits Magazine - Spring 2016 - Cover3
IEEE Solid-State Circuits Magazine - Spring 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2023
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2022
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2021
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_spring2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_winter2020
https://www.nxtbook.com/nxtbooks/ieee/mssc_fall2019
https://www.nxtbook.com/nxtbooks/ieee/mssc_summer2019
https://www.nxtbook.com/nxtbooks/ieee/mssc_2019summer
https://www.nxtbook.com/nxtbooks/ieee/mssc_2019winter
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018fall
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018summer
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018spring
https://www.nxtbook.com/nxtbooks/ieee/mssc_2018winter
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2017
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2016
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2015
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_winter2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_fall2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_summer2014
https://www.nxtbook.com/nxtbooks/ieee/solidstatecircuits_spring2014
https://www.nxtbookmedia.com