IEEE Robotics & Automation Magazine - June 2020 - 49

i

! i - a $ d i J (i).

(1)

The feedback given by the human indicates only the sign of
the policy error. Its magnitude is supposed to be unknown
since the algorithm works under the assumption that the user
is non-expert; therefore, she/he does not know the magnitude
of the proper action. Instead, the error magnitude is defined as
the hyperparameter e that must be defined before starting the
learning process. Thus, the policy errort is defined by h t $ e.
To compute a gradient in the parameter space of the policy, the error needs to be a function of i. This is achieved by
observing that
errort (i) = a target
- r i (o t),
t

(2)

where a target
is the incremental objective generated by the
t
feedback of the human a target
= a t + errort, and a t is the
t
r
.
current output of the policy i From (1), (2), and the derivative of the mean squared error, we can get the COACH
update step:
i

! i + a $ errort $ d i r i .

(3)

To be more data efficient and avoid locally overfitting to the
most recent corrections, D-COACH has a memory buffer that
stores the tuple (o t, a target
) and replays this information during
t
learning. Additionally, when working in problems with highdimensional observations, an autoencoding cost is incorporated in D-COACH as an observation reconstruction SRL
strategy. In the D-COACH pseudocode (Algorithm 2), this
SRL step is omitted. D-COACH learns everything from
scratch through only one interactive phase, unlike other deep
interactive RL approaches [4], [6], which split the learning
process into two sequential learning phases: first, recording
samples of the environment for training a dimensionality
reduction model (e.g., an autoencoder) and, second, using that
model for the input of the policy network during the actual
interactive learning process.

finding compact Markovian embeddings. We propose a neural
architecture separated into two parts: 1) the transition model
and 2) the policy. The transition model is in charge of learning
the dynamics of the environment in a supervised manner,
using samples collected by the agent. The policy part is shaped
using only corrective feedback. Figure 2 shows this architecture.
Learning to predict the next observation o t +1 forces a
Markovian SR. This has been successfully applied in RL [16].
RNNs can encode information from past observations in
. Thus, the objective of the first part of
their hidden state h LSTM
t
u t +1, which, as a conM
(
o t, a t, h LSTM
the NN is to learn
t-1 ) = o
. Addisequence, learns to embed past observations in h LSTM
t
tionally, when the observations are high-dimensional (raw
images), the agents also need to learn to compress spatial
information. To achieve this, a common approach is to compress this information in the latent space of an autoencoder.
For the first part of the architecture, we propose a combination of an autoencoder with an LSTM to compute the

Algorithm 2: D-COACH
1: Require: error magnitude e, buffer update interval b
2: Init: B = [] # initialize memory buffer
3: for t = 1, 2, f do
4:
observe state o t
5:
execute action a t = r i (o t)
6:
feedback human corrective advice h t
7:
if h t is not 0 then
error t = h t $ e
8:
9:
a target (t) = a t + error t
target
10:
update r using SGD with pair (o t, a t )
11:
update r using SGD with a minibatch sampled
from B
target
12:
append (o t, a t ) to B
13:
if mod(t, b) is 0 then
14:
update r i using SGD with a minibatch sampled
from B

Learning Temporal Features Based on
Interactive Teaching and World Modeling
In this section, the SRL NN architecture is described along
with the interactive algorithm for policy shaping.

LSTM

ht

~
ot+1

Transition
Model

ot

st

"

Network Architecture for Extracting
Temporal Features
When approaching problems that lack temporal information
in the observations, the most common solution is to model
the policy with RNNs, as discussed in the "Dealing With
Non-Markovian Environments" section; therefore, we propose to shape policies that are built on top of RNNs, with
occasional human feedback. In this article, we use the terms
world model and transition model interchangeably.
IIL methods can take advantage of SRL for training with
other objective functions by 1) making use of all of the experience collected in every time step and 2) boosting the process of

LSTM

ht-1

at

Policy

at

Figure 2. The general structure of the transition model and
policy.

JUNE 2020

*

IEEE ROBOTICS & AUTOMATION MAGAZINE

*

49



IEEE Robotics & Automation Magazine - June 2020

Table of Contents for the Digital Edition of IEEE Robotics & Automation Magazine - June 2020

Contents
IEEE Robotics & Automation Magazine - June 2020 - Cover1
IEEE Robotics & Automation Magazine - June 2020 - Cover2
IEEE Robotics & Automation Magazine - June 2020 - Contents
IEEE Robotics & Automation Magazine - June 2020 - 2
IEEE Robotics & Automation Magazine - June 2020 - 3
IEEE Robotics & Automation Magazine - June 2020 - 4
IEEE Robotics & Automation Magazine - June 2020 - 5
IEEE Robotics & Automation Magazine - June 2020 - 6
IEEE Robotics & Automation Magazine - June 2020 - 7
IEEE Robotics & Automation Magazine - June 2020 - 8
IEEE Robotics & Automation Magazine - June 2020 - 9
IEEE Robotics & Automation Magazine - June 2020 - 10
IEEE Robotics & Automation Magazine - June 2020 - 11
IEEE Robotics & Automation Magazine - June 2020 - 12
IEEE Robotics & Automation Magazine - June 2020 - 13
IEEE Robotics & Automation Magazine - June 2020 - 14
IEEE Robotics & Automation Magazine - June 2020 - 15
IEEE Robotics & Automation Magazine - June 2020 - 16
IEEE Robotics & Automation Magazine - June 2020 - 17
IEEE Robotics & Automation Magazine - June 2020 - 18
IEEE Robotics & Automation Magazine - June 2020 - 19
IEEE Robotics & Automation Magazine - June 2020 - 20
IEEE Robotics & Automation Magazine - June 2020 - 21
IEEE Robotics & Automation Magazine - June 2020 - 22
IEEE Robotics & Automation Magazine - June 2020 - 23
IEEE Robotics & Automation Magazine - June 2020 - 24
IEEE Robotics & Automation Magazine - June 2020 - 25
IEEE Robotics & Automation Magazine - June 2020 - 26
IEEE Robotics & Automation Magazine - June 2020 - 27
IEEE Robotics & Automation Magazine - June 2020 - 28
IEEE Robotics & Automation Magazine - June 2020 - 29
IEEE Robotics & Automation Magazine - June 2020 - 30
IEEE Robotics & Automation Magazine - June 2020 - 31
IEEE Robotics & Automation Magazine - June 2020 - 32
IEEE Robotics & Automation Magazine - June 2020 - 33
IEEE Robotics & Automation Magazine - June 2020 - 34
IEEE Robotics & Automation Magazine - June 2020 - 35
IEEE Robotics & Automation Magazine - June 2020 - 36
IEEE Robotics & Automation Magazine - June 2020 - 37
IEEE Robotics & Automation Magazine - June 2020 - 38
IEEE Robotics & Automation Magazine - June 2020 - 39
IEEE Robotics & Automation Magazine - June 2020 - 40
IEEE Robotics & Automation Magazine - June 2020 - 41
IEEE Robotics & Automation Magazine - June 2020 - 42
IEEE Robotics & Automation Magazine - June 2020 - 43
IEEE Robotics & Automation Magazine - June 2020 - 44
IEEE Robotics & Automation Magazine - June 2020 - 45
IEEE Robotics & Automation Magazine - June 2020 - 46
IEEE Robotics & Automation Magazine - June 2020 - 47
IEEE Robotics & Automation Magazine - June 2020 - 48
IEEE Robotics & Automation Magazine - June 2020 - 49
IEEE Robotics & Automation Magazine - June 2020 - 50
IEEE Robotics & Automation Magazine - June 2020 - 51
IEEE Robotics & Automation Magazine - June 2020 - 52
IEEE Robotics & Automation Magazine - June 2020 - 53
IEEE Robotics & Automation Magazine - June 2020 - 54
IEEE Robotics & Automation Magazine - June 2020 - 55
IEEE Robotics & Automation Magazine - June 2020 - 56
IEEE Robotics & Automation Magazine - June 2020 - 57
IEEE Robotics & Automation Magazine - June 2020 - 58
IEEE Robotics & Automation Magazine - June 2020 - 59
IEEE Robotics & Automation Magazine - June 2020 - 60
IEEE Robotics & Automation Magazine - June 2020 - 61
IEEE Robotics & Automation Magazine - June 2020 - 62
IEEE Robotics & Automation Magazine - June 2020 - 63
IEEE Robotics & Automation Magazine - June 2020 - 64
IEEE Robotics & Automation Magazine - June 2020 - 65
IEEE Robotics & Automation Magazine - June 2020 - 66
IEEE Robotics & Automation Magazine - June 2020 - 67
IEEE Robotics & Automation Magazine - June 2020 - 68
IEEE Robotics & Automation Magazine - June 2020 - 69
IEEE Robotics & Automation Magazine - June 2020 - 70
IEEE Robotics & Automation Magazine - June 2020 - 71
IEEE Robotics & Automation Magazine - June 2020 - 72
IEEE Robotics & Automation Magazine - June 2020 - 73
IEEE Robotics & Automation Magazine - June 2020 - 74
IEEE Robotics & Automation Magazine - June 2020 - 75
IEEE Robotics & Automation Magazine - June 2020 - 76
IEEE Robotics & Automation Magazine - June 2020 - 77
IEEE Robotics & Automation Magazine - June 2020 - 78
IEEE Robotics & Automation Magazine - June 2020 - 79
IEEE Robotics & Automation Magazine - June 2020 - 80
IEEE Robotics & Automation Magazine - June 2020 - 81
IEEE Robotics & Automation Magazine - June 2020 - 82
IEEE Robotics & Automation Magazine - June 2020 - 83
IEEE Robotics & Automation Magazine - June 2020 - 84
IEEE Robotics & Automation Magazine - June 2020 - 85
IEEE Robotics & Automation Magazine - June 2020 - 86
IEEE Robotics & Automation Magazine - June 2020 - 87
IEEE Robotics & Automation Magazine - June 2020 - 88
IEEE Robotics & Automation Magazine - June 2020 - 89
IEEE Robotics & Automation Magazine - June 2020 - 90
IEEE Robotics & Automation Magazine - June 2020 - 91
IEEE Robotics & Automation Magazine - June 2020 - 92
IEEE Robotics & Automation Magazine - June 2020 - 93
IEEE Robotics & Automation Magazine - June 2020 - 94
IEEE Robotics & Automation Magazine - June 2020 - 95
IEEE Robotics & Automation Magazine - June 2020 - 96
IEEE Robotics & Automation Magazine - June 2020 - 97
IEEE Robotics & Automation Magazine - June 2020 - 98
IEEE Robotics & Automation Magazine - June 2020 - 99
IEEE Robotics & Automation Magazine - June 2020 - 100
IEEE Robotics & Automation Magazine - June 2020 - 101
IEEE Robotics & Automation Magazine - June 2020 - 102
IEEE Robotics & Automation Magazine - June 2020 - 103
IEEE Robotics & Automation Magazine - June 2020 - 104
IEEE Robotics & Automation Magazine - June 2020 - 105
IEEE Robotics & Automation Magazine - June 2020 - 106
IEEE Robotics & Automation Magazine - June 2020 - 107
IEEE Robotics & Automation Magazine - June 2020 - 108
IEEE Robotics & Automation Magazine - June 2020 - 109
IEEE Robotics & Automation Magazine - June 2020 - 110
IEEE Robotics & Automation Magazine - June 2020 - 111
IEEE Robotics & Automation Magazine - June 2020 - 112
IEEE Robotics & Automation Magazine - June 2020 - 113
IEEE Robotics & Automation Magazine - June 2020 - 114
IEEE Robotics & Automation Magazine - June 2020 - 115
IEEE Robotics & Automation Magazine - June 2020 - 116
IEEE Robotics & Automation Magazine - June 2020 - 117
IEEE Robotics & Automation Magazine - June 2020 - 118
IEEE Robotics & Automation Magazine - June 2020 - 119
IEEE Robotics & Automation Magazine - June 2020 - 120
IEEE Robotics & Automation Magazine - June 2020 - 121
IEEE Robotics & Automation Magazine - June 2020 - 122
IEEE Robotics & Automation Magazine - June 2020 - 123
IEEE Robotics & Automation Magazine - June 2020 - 124
IEEE Robotics & Automation Magazine - June 2020 - 125
IEEE Robotics & Automation Magazine - June 2020 - 126
IEEE Robotics & Automation Magazine - June 2020 - 127
IEEE Robotics & Automation Magazine - June 2020 - 128
IEEE Robotics & Automation Magazine - June 2020 - 129
IEEE Robotics & Automation Magazine - June 2020 - 130
IEEE Robotics & Automation Magazine - June 2020 - 131
IEEE Robotics & Automation Magazine - June 2020 - 132
IEEE Robotics & Automation Magazine - June 2020 - 133
IEEE Robotics & Automation Magazine - June 2020 - 134
IEEE Robotics & Automation Magazine - June 2020 - 135
IEEE Robotics & Automation Magazine - June 2020 - 136
IEEE Robotics & Automation Magazine - June 2020 - 137
IEEE Robotics & Automation Magazine - June 2020 - 138
IEEE Robotics & Automation Magazine - June 2020 - 139
IEEE Robotics & Automation Magazine - June 2020 - 140
IEEE Robotics & Automation Magazine - June 2020 - 141
IEEE Robotics & Automation Magazine - June 2020 - 142
IEEE Robotics & Automation Magazine - June 2020 - 143
IEEE Robotics & Automation Magazine - June 2020 - 144
IEEE Robotics & Automation Magazine - June 2020 - 145
IEEE Robotics & Automation Magazine - June 2020 - 146
IEEE Robotics & Automation Magazine - June 2020 - 147
IEEE Robotics & Automation Magazine - June 2020 - 148
IEEE Robotics & Automation Magazine - June 2020 - 149
IEEE Robotics & Automation Magazine - June 2020 - 150
IEEE Robotics & Automation Magazine - June 2020 - 151
IEEE Robotics & Automation Magazine - June 2020 - 152
IEEE Robotics & Automation Magazine - June 2020 - 153
IEEE Robotics & Automation Magazine - June 2020 - 154
IEEE Robotics & Automation Magazine - June 2020 - 155
IEEE Robotics & Automation Magazine - June 2020 - 156
IEEE Robotics & Automation Magazine - June 2020 - 157
IEEE Robotics & Automation Magazine - June 2020 - 158
IEEE Robotics & Automation Magazine - June 2020 - 159
IEEE Robotics & Automation Magazine - June 2020 - 160
IEEE Robotics & Automation Magazine - June 2020 - 161
IEEE Robotics & Automation Magazine - June 2020 - 162
IEEE Robotics & Automation Magazine - June 2020 - 163
IEEE Robotics & Automation Magazine - June 2020 - 164
IEEE Robotics & Automation Magazine - June 2020 - 165
IEEE Robotics & Automation Magazine - June 2020 - 166
IEEE Robotics & Automation Magazine - June 2020 - 167
IEEE Robotics & Automation Magazine - June 2020 - 168
IEEE Robotics & Automation Magazine - June 2020 - 169
IEEE Robotics & Automation Magazine - June 2020 - 170
IEEE Robotics & Automation Magazine - June 2020 - 171
IEEE Robotics & Automation Magazine - June 2020 - 172
IEEE Robotics & Automation Magazine - June 2020 - 173
IEEE Robotics & Automation Magazine - June 2020 - 174
IEEE Robotics & Automation Magazine - June 2020 - 175
IEEE Robotics & Automation Magazine - June 2020 - 176
IEEE Robotics & Automation Magazine - June 2020 - 177
IEEE Robotics & Automation Magazine - June 2020 - 178
IEEE Robotics & Automation Magazine - June 2020 - 179
IEEE Robotics & Automation Magazine - June 2020 - 180
IEEE Robotics & Automation Magazine - June 2020 - 181
IEEE Robotics & Automation Magazine - June 2020 - 182
IEEE Robotics & Automation Magazine - June 2020 - 183
IEEE Robotics & Automation Magazine - June 2020 - 184
IEEE Robotics & Automation Magazine - June 2020 - Cover3
IEEE Robotics & Automation Magazine - June 2020 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2010
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2010
https://www.nxtbookmedia.com