Signal Processing - January 2016 - 108

For example, when applied to G BS, it converges in the scenario of
multiband multiple access channels while it does not converge in the
multiband interference channel case as cycles appear [64]. This is
quite frequent in games with discrete actions. Therefore, learning
algorithms such as the one described in the "Reinforcement Learning" section are not only useful to assume less structure on the problem but also to deal with the discrete case. From now on, we thus
assume that
A k = {a k, 1, f, a k, N k},

(31)

where | A k | 1 + 3. The FP algorithm, introduced by Brown in
1951 [65], is a BRD algorithm in which empirical frequencies are
used. Working with probability distributions is very convenient
mathematically. Although mixed strategies are exploited, this
does not mean that mixed NE are sought. In fact, pure NE can be
shown to be attracting points for all the dynamics, which are considered in this tutorial. This means that, under appropriate conditions, mixed strategies tend to pure strategies as the number of
iterations grows large. The empirical frequency of use of action
a k ! A k for player k ! K at time t + 1 is defined by
r k, a k (t + 1) =

t+1

1 /1
,
t + 1 tl = 1 {a k, tl = a k}

(32)

where 1 is the indicator function. If player k knows r -k, a -k (t)
(i.e., the empirical frequency of use of the action profile a -k at time
t), then it can compute its own expected utility and eventually
choose the action maximizing it. Observe that the computation of
r -k, a -k (t) requires to observe the actions played by the others. As
for BRD, this knowledge can be acquired only through an exchange
of information among the players. For example, in the two-player
CR's dilemma (Example 2), if CR1 has knowledge of the number of
times that CR2 has picked narrowband or widebandup to time
t, then CR1 can easily compute r 2, a 2 (t) through (32).
In its simultaneous form, the FP algorithm operates as follows:
a k (t + 1) ! arg max

ak ! Ak

K

/ r -k, a

-k

(t) u k (a k, a -k) .

(33)

k=1

The important point we want to make about the FP algorithm
concerns the structure of the empirical frequencies. They can be
computed in a recursive fashion as
r k, a k (t + 1) =

t+1

1 /1
= 1
t + 1 tl = 1 {a k, t = a k} t + 1
t

1 1
t + 1 {a = a }
= r k, a k (t) + m FP
k (t) 61 {a k, t + 1 = a k} - r k, a k (t)@,

/ 1 {a

tl = 1

k , t l = a k}

+

k, t + 1

k

(34)

with m FP
k (t) = 1/ (t + 1) . The last line translates the fact that the
empirical frequency at time t + 1 can be computed from its
value at time t and the knowledge of the current action. More
interestingly, it emphasizes a quite general structure, which is
encountered with many iterative and reinforcement learning
(RL) algorithms.

REINFORCEMENT LEARNING
Originally, RL was studied in the context of single-player (or
single-automation) environments with a finite set of actions. A
player receives a numerical utility signal and updates its strategy. The environment provides this signal as a feedback for the
sequence of actions that has be taken by the player. Typically,
the latter relates the utility signal to actions previously taken to
learn a mixed strategy that performs well in terms of average
utility. In a multiplayer setting, RL is inherently more complex
since the learning process itself changes the thing to be learned.
The main objective of this section is to show that feeding back
to the players only the realizations of their utilities is enough to
drive seemingly complex interactions to a steady state or, at
least, to a predictable evolution of the state. In RL algorithms,
players use their experience to choose or avoid certain actions
based on their consequences. Actions that led to satisfactory
outcomes will tend to be repeated in the future, whereas actions
that led to unsatisfactory experiences will be avoided. One of the
first RL algorithms was proposed by Bush and Mosteller in [66],
wherein each player's strategy is defined by the probability of
undertaking each of the available actions. After every player has
selected an action according to its probability, every player
receives the corresponding utility and revises the probability of
undertaking that action according to a reinforcement policy.
More formally, let u k (t) be the value of the utility function of
player k at time t, and denote by r k, a k,n (t) the probability
player k assigns to action a k, n at time t. Then, the Bush and
Mosteller RL algorithm operates as follows:
r k, a k,n (t + 1)= r k, a k,n (t) + m k (t) u k (t) 61 {a k (t) = a k,n} - r k, a k,n (t)@ , (35)
RL

where 0 1 m RL
k (t) 1 1 is a known function that regulates the
learning rate of player k (it plays the same role as the step-size in
the gradient method). The updating rule given by (35) has the
same form of (34), but one of the strengths of the algorithm corresponding to (35) is that each player only needs to observe the
realization of its utility function and nothing else. It can, therefore, be applied to any finite game. Convergence is ensured for
classes of games such as potential games and supermodular
games, introduced in the "Special Classes of Strategic-Form
Games" section. The convergence of RL algorithms is also
ensured for dominance solvable games [22], which are not
treated in this tutorial due to space limitations. As for the BRD,
convergence points are either pure NE or boundary points. The
price to be paid for the high flexibility regarding the environment
and the absence of strong assumptions on its structure is that the
RL algorithm in (35) usually requires a large number of iterations
to converge compared to the BRD algorithm.
All the aforementioned distributed algorithms (i.e., the BRD
algorithm, the FP algorithm, and the considered RL algorithm)
are attractive since they only rely on partial knowledge of the
problem. On the other hand, convergence points are typically
pure NE, which, in most cases, are inefficient. Often, points that
Pareto-dominate the NE points can be shown to exist. A nontrivial problem is how to reach one of them in a distributed manner.

IEEE SIGNAL PROCESSING MAGAZINE [108] JANuARy 2016

Table of Contents for the Digital Edition of Signal Processing - January 2016

Signal Processing - January 2016 - Cover1
Signal Processing - January 2016 - Cover2
Signal Processing - January 2016 - 1
Signal Processing - January 2016 - 2
Signal Processing - January 2016 - 3
Signal Processing - January 2016 - 4
Signal Processing - January 2016 - 5
Signal Processing - January 2016 - 6
Signal Processing - January 2016 - 7
Signal Processing - January 2016 - 8
Signal Processing - January 2016 - 9
Signal Processing - January 2016 - 10
Signal Processing - January 2016 - 11
Signal Processing - January 2016 - 12
Signal Processing - January 2016 - 13
Signal Processing - January 2016 - 14
Signal Processing - January 2016 - 15
Signal Processing - January 2016 - 16
Signal Processing - January 2016 - 17
Signal Processing - January 2016 - 18
Signal Processing - January 2016 - 19
Signal Processing - January 2016 - 20
Signal Processing - January 2016 - 21
Signal Processing - January 2016 - 22
Signal Processing - January 2016 - 23
Signal Processing - January 2016 - 24
Signal Processing - January 2016 - 25
Signal Processing - January 2016 - 26
Signal Processing - January 2016 - 27
Signal Processing - January 2016 - 28
Signal Processing - January 2016 - 29
Signal Processing - January 2016 - 30
Signal Processing - January 2016 - 31
Signal Processing - January 2016 - 32
Signal Processing - January 2016 - 33
Signal Processing - January 2016 - 34
Signal Processing - January 2016 - 35
Signal Processing - January 2016 - 36
Signal Processing - January 2016 - 37
Signal Processing - January 2016 - 38
Signal Processing - January 2016 - 39
Signal Processing - January 2016 - 40
Signal Processing - January 2016 - 41
Signal Processing - January 2016 - 42
Signal Processing - January 2016 - 43
Signal Processing - January 2016 - 44
Signal Processing - January 2016 - 45
Signal Processing - January 2016 - 46
Signal Processing - January 2016 - 47
Signal Processing - January 2016 - 48
Signal Processing - January 2016 - 49
Signal Processing - January 2016 - 50
Signal Processing - January 2016 - 51
Signal Processing - January 2016 - 52
Signal Processing - January 2016 - 53
Signal Processing - January 2016 - 54
Signal Processing - January 2016 - 55
Signal Processing - January 2016 - 56
Signal Processing - January 2016 - 57
Signal Processing - January 2016 - 58
Signal Processing - January 2016 - 59
Signal Processing - January 2016 - 60
Signal Processing - January 2016 - 61
Signal Processing - January 2016 - 62
Signal Processing - January 2016 - 63
Signal Processing - January 2016 - 64
Signal Processing - January 2016 - 65
Signal Processing - January 2016 - 66
Signal Processing - January 2016 - 67
Signal Processing - January 2016 - 68
Signal Processing - January 2016 - 69
Signal Processing - January 2016 - 70
Signal Processing - January 2016 - 71
Signal Processing - January 2016 - 72
Signal Processing - January 2016 - 73
Signal Processing - January 2016 - 74
Signal Processing - January 2016 - 75
Signal Processing - January 2016 - 76
Signal Processing - January 2016 - 77
Signal Processing - January 2016 - 78
Signal Processing - January 2016 - 79
Signal Processing - January 2016 - 80
Signal Processing - January 2016 - 81
Signal Processing - January 2016 - 82
Signal Processing - January 2016 - 83
Signal Processing - January 2016 - 84
Signal Processing - January 2016 - 85
Signal Processing - January 2016 - 86
Signal Processing - January 2016 - 87
Signal Processing - January 2016 - 88
Signal Processing - January 2016 - 89
Signal Processing - January 2016 - 90
Signal Processing - January 2016 - 91
Signal Processing - January 2016 - 92
Signal Processing - January 2016 - 93
Signal Processing - January 2016 - 94
Signal Processing - January 2016 - 95
Signal Processing - January 2016 - 96
Signal Processing - January 2016 - 97
Signal Processing - January 2016 - 98
Signal Processing - January 2016 - 99
Signal Processing - January 2016 - 100
Signal Processing - January 2016 - 101
Signal Processing - January 2016 - 102
Signal Processing - January 2016 - 103
Signal Processing - January 2016 - 104
Signal Processing - January 2016 - 105
Signal Processing - January 2016 - 106
Signal Processing - January 2016 - 107
Signal Processing - January 2016 - 108
Signal Processing - January 2016 - 109
Signal Processing - January 2016 - 110
Signal Processing - January 2016 - 111
Signal Processing - January 2016 - 112
Signal Processing - January 2016 - 113
Signal Processing - January 2016 - 114
Signal Processing - January 2016 - 115
Signal Processing - January 2016 - 116
Signal Processing - January 2016 - 117
Signal Processing - January 2016 - 118
Signal Processing - January 2016 - 119
Signal Processing - January 2016 - 120
Signal Processing - January 2016 - 121
Signal Processing - January 2016 - 122
Signal Processing - January 2016 - 123
Signal Processing - January 2016 - 124
Signal Processing - January 2016 - 125
Signal Processing - January 2016 - 126
Signal Processing - January 2016 - 127
Signal Processing - January 2016 - 128
Signal Processing - January 2016 - 129
Signal Processing - January 2016 - 130
Signal Processing - January 2016 - 131
Signal Processing - January 2016 - 132
Signal Processing - January 2016 - 133
Signal Processing - January 2016 - 134
Signal Processing - January 2016 - 135
Signal Processing - January 2016 - 136
Signal Processing - January 2016 - 137
Signal Processing - January 2016 - 138
Signal Processing - January 2016 - 139
Signal Processing - January 2016 - 140
Signal Processing - January 2016 - 141
Signal Processing - January 2016 - 142
Signal Processing - January 2016 - 143
Signal Processing - January 2016 - 144
Signal Processing - January 2016 - 145
Signal Processing - January 2016 - 146
Signal Processing - January 2016 - 147
Signal Processing - January 2016 - 148
Signal Processing - January 2016 - 149
Signal Processing - January 2016 - 150
Signal Processing - January 2016 - 151
Signal Processing - January 2016 - 152
Signal Processing - January 2016 - 153
Signal Processing - January 2016 - 154
Signal Processing - January 2016 - 155
Signal Processing - January 2016 - 156
Signal Processing - January 2016 - 157
Signal Processing - January 2016 - 158
Signal Processing - January 2016 - 159
Signal Processing - January 2016 - 160
Signal Processing - January 2016 - 161
Signal Processing - January 2016 - 162
Signal Processing - January 2016 - 163
Signal Processing - January 2016 - 164
Signal Processing - January 2016 - 165
Signal Processing - January 2016 - 166
Signal Processing - January 2016 - 167
Signal Processing - January 2016 - 168
Signal Processing - January 2016 - Cover3
Signal Processing - January 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com