IEEE Robotics & Automation Magazine - March 2016 - 97
System identification methods can be used to build a parameterized model, and the acquired model can virtually sample
the data for policy improvement [1], [4], [14], [17], [22], [35],
[36]. However, for policy improvement, the learned simulation
model's generalization performance needs to be carefully verified. Even though the simulation model can be useful for predicting state transitions, explicitly predicting them is not
necessary for policy updates.
In this article, we propose using previously acquired data
as a simulation model instead of building a parameterized
simulation model. To utilize the previously acquired data to
improve the current policy, we need to reevaluate the previous
data in terms of the current policy. To do this in reinforcement-learning (RL) frameworks, importance-weighted policy
gradients with parameter-based exploration (IW-PGPE) can
be used [34]. The usefulness of this approach has been thoroughly evaluated by comparisons in numerical simulations
with previously proposed RL methods [6], [9], [15], [25], [29].
However, no work has clearly stated and shown that this
particular combination of PGPE and importance weighting
to derive the gradient of objective functions is suitable for
real robot learning. In our previous study, we presented
preliminary results and showed that IW-PGPE is a useful
approach for humanoid motor learning [23]. In this article,
we extend our IW-PGPE algorithm to more efficiently
reuse previous experiences by introducing a recursive operation to policy updates and show how our extended algorithm is useful for real robot learning in high-dimensional
spaces. We successfully applied our proposed approach to
two different tasks with two different conditions. First, we
applied it to a cart-pole swing-up task in a real-virtual hybrid environment with a PS Move motion controller. Then
we applied it to a challenging basketball-shooting task in a
real environment.
number of steps, called the horizon length. We assume that T is
a fixed deterministic number. Then the discounted cumulative
reward along h, called the return, is given by
R (h) :=
T-1
/ c t - 1 r (x (t), u (t)) + U (x (T)),
(1)
t=1
where c ! [0, 1) is the discount factor for future rewards. The
immediate and terminal rewards are r ^x ^ t h, u ^ t hh and
U (x (T)).
Policy Models
In this article, we consider feedback and feedforward policy
models.
Feedback Policy Model
We use locally linear state-dependent basis functions in our
feedback policy model [20], [21], [31]:
u (t) = W fb z fb (z (t)),
(2)
where z (t) ! Z is a feedback state at time t. Note that state
space Z can be a subset of the original state space, i.e., Z 1 X.
W fb is a state-dependent matrix, and z fb (z (t)) ! 0 M is a vector that consists of state-dependent basis functions, where M is
the number of basis functions.
Feedforward Policy Model
The feedforward policy model is formulated as:
u (t) = W ff z ff (t),
M
(3)
ff
M#1
where w ! 0 is the parameter vector, z (t) ! 0
is a
vector that consists of time-dependent basis functions, and
W ff is a parameter matrix.
Motor Learning Framework
Trajectories and Returns
We assume that the underlying control problem is a discretetime Markov decision processes. At each discrete time step t,
the agent observes state x (t) ! X, selects action u (t) ! U,
and receives immediate reward r (t) of the results from a state
transition in the environment. The environment's dynamics
are characterized by p (x (t + 1) | x (t), u (t)), which represents the transition probability density from current state x (t)
to subsequent state x ^t + 1h when action u (t) is taken;
p ^x ^ 1 hh is the probability density of the initial states. Immediate reward r (t) is given based on reward function
r ^x ^ t h, u ^ t h, x ^t + 1hh .
The robot's decision-making procedure at each time step t is
characterized by parameterized policy p ^u ^ t h x ^ t h, w) with
parameter w, which represents the conditional probability density of taking action u (t) in state x (t). We assume that the policy is continuously differentiable with respect to parameter w.
A sequence of states and actions forms a trajectory denoted
by h := [x (1), u (1), f, x (T), u (T)], where T denotes the
Table 1. The supplemental equations for low-level
controller and gradient of objective function with
reference to policy parameters.
1.1) The expected return in the PGPE formulation is defined
in terms of the expectations over both h and w as a function of hyperparameter t:
J (t): =
##
p (h ; w) p (w ; t) R (h) dhdw.
1.2) The derivative of the expected return using log arithmic
d t p (w ; t)
:
derivative d t log p (w ; t) =
p (w ; t)
d t J (t) =
##
p (h ; w) p (w ; t) d t log p (w ; t) R (h)dhdw.
1.3) The expectations over h and w are approximated by
empirical averages:
N
d t Jc (t) = 1 / d t log p (w n ; t) R (h n).
N n=1
2) We can acquire the gradient information for policy updates
that are weighted by importance weight v:
Nl
d t tJ IW (t) := 1 / v (w ln) d t log p (w ln ; t) R (hln).
Nl n = 1
march 2016
*
IEEE ROBOTICS & AUTOMATION MAGAZINE
*
97
http://www.d.tt
Table of Contents for the Digital Edition of IEEE Robotics & Automation Magazine - March 2016
IEEE Robotics & Automation Magazine - March 2016 - Cover1
IEEE Robotics & Automation Magazine - March 2016 - Cover2
IEEE Robotics & Automation Magazine - March 2016 - 1
IEEE Robotics & Automation Magazine - March 2016 - 2
IEEE Robotics & Automation Magazine - March 2016 - 3
IEEE Robotics & Automation Magazine - March 2016 - 4
IEEE Robotics & Automation Magazine - March 2016 - 5
IEEE Robotics & Automation Magazine - March 2016 - 6
IEEE Robotics & Automation Magazine - March 2016 - 7
IEEE Robotics & Automation Magazine - March 2016 - 8
IEEE Robotics & Automation Magazine - March 2016 - 9
IEEE Robotics & Automation Magazine - March 2016 - 10
IEEE Robotics & Automation Magazine - March 2016 - 11
IEEE Robotics & Automation Magazine - March 2016 - 12
IEEE Robotics & Automation Magazine - March 2016 - 13
IEEE Robotics & Automation Magazine - March 2016 - 14
IEEE Robotics & Automation Magazine - March 2016 - 15
IEEE Robotics & Automation Magazine - March 2016 - 16
IEEE Robotics & Automation Magazine - March 2016 - 17
IEEE Robotics & Automation Magazine - March 2016 - 18
IEEE Robotics & Automation Magazine - March 2016 - 19
IEEE Robotics & Automation Magazine - March 2016 - 20
IEEE Robotics & Automation Magazine - March 2016 - 21
IEEE Robotics & Automation Magazine - March 2016 - 22
IEEE Robotics & Automation Magazine - March 2016 - 23
IEEE Robotics & Automation Magazine - March 2016 - 24
IEEE Robotics & Automation Magazine - March 2016 - 25
IEEE Robotics & Automation Magazine - March 2016 - 26
IEEE Robotics & Automation Magazine - March 2016 - 27
IEEE Robotics & Automation Magazine - March 2016 - 28
IEEE Robotics & Automation Magazine - March 2016 - 29
IEEE Robotics & Automation Magazine - March 2016 - 30
IEEE Robotics & Automation Magazine - March 2016 - 31
IEEE Robotics & Automation Magazine - March 2016 - 32
IEEE Robotics & Automation Magazine - March 2016 - 33
IEEE Robotics & Automation Magazine - March 2016 - 34
IEEE Robotics & Automation Magazine - March 2016 - 35
IEEE Robotics & Automation Magazine - March 2016 - 36
IEEE Robotics & Automation Magazine - March 2016 - 37
IEEE Robotics & Automation Magazine - March 2016 - 38
IEEE Robotics & Automation Magazine - March 2016 - 39
IEEE Robotics & Automation Magazine - March 2016 - 40
IEEE Robotics & Automation Magazine - March 2016 - 41
IEEE Robotics & Automation Magazine - March 2016 - 42
IEEE Robotics & Automation Magazine - March 2016 - 43
IEEE Robotics & Automation Magazine - March 2016 - 44
IEEE Robotics & Automation Magazine - March 2016 - 45
IEEE Robotics & Automation Magazine - March 2016 - 46
IEEE Robotics & Automation Magazine - March 2016 - 47
IEEE Robotics & Automation Magazine - March 2016 - 48
IEEE Robotics & Automation Magazine - March 2016 - 49
IEEE Robotics & Automation Magazine - March 2016 - 50
IEEE Robotics & Automation Magazine - March 2016 - 51
IEEE Robotics & Automation Magazine - March 2016 - 52
IEEE Robotics & Automation Magazine - March 2016 - 53
IEEE Robotics & Automation Magazine - March 2016 - 54
IEEE Robotics & Automation Magazine - March 2016 - 55
IEEE Robotics & Automation Magazine - March 2016 - 56
IEEE Robotics & Automation Magazine - March 2016 - 57
IEEE Robotics & Automation Magazine - March 2016 - 58
IEEE Robotics & Automation Magazine - March 2016 - 59
IEEE Robotics & Automation Magazine - March 2016 - 60
IEEE Robotics & Automation Magazine - March 2016 - 61
IEEE Robotics & Automation Magazine - March 2016 - 62
IEEE Robotics & Automation Magazine - March 2016 - 63
IEEE Robotics & Automation Magazine - March 2016 - 64
IEEE Robotics & Automation Magazine - March 2016 - 65
IEEE Robotics & Automation Magazine - March 2016 - 66
IEEE Robotics & Automation Magazine - March 2016 - 67
IEEE Robotics & Automation Magazine - March 2016 - 68
IEEE Robotics & Automation Magazine - March 2016 - 69
IEEE Robotics & Automation Magazine - March 2016 - 70
IEEE Robotics & Automation Magazine - March 2016 - 71
IEEE Robotics & Automation Magazine - March 2016 - 72
IEEE Robotics & Automation Magazine - March 2016 - 73
IEEE Robotics & Automation Magazine - March 2016 - 74
IEEE Robotics & Automation Magazine - March 2016 - 75
IEEE Robotics & Automation Magazine - March 2016 - 76
IEEE Robotics & Automation Magazine - March 2016 - 77
IEEE Robotics & Automation Magazine - March 2016 - 78
IEEE Robotics & Automation Magazine - March 2016 - 79
IEEE Robotics & Automation Magazine - March 2016 - 80
IEEE Robotics & Automation Magazine - March 2016 - 81
IEEE Robotics & Automation Magazine - March 2016 - 82
IEEE Robotics & Automation Magazine - March 2016 - 83
IEEE Robotics & Automation Magazine - March 2016 - 84
IEEE Robotics & Automation Magazine - March 2016 - 85
IEEE Robotics & Automation Magazine - March 2016 - 86
IEEE Robotics & Automation Magazine - March 2016 - 87
IEEE Robotics & Automation Magazine - March 2016 - 88
IEEE Robotics & Automation Magazine - March 2016 - 89
IEEE Robotics & Automation Magazine - March 2016 - 90
IEEE Robotics & Automation Magazine - March 2016 - 91
IEEE Robotics & Automation Magazine - March 2016 - 92
IEEE Robotics & Automation Magazine - March 2016 - 93
IEEE Robotics & Automation Magazine - March 2016 - 94
IEEE Robotics & Automation Magazine - March 2016 - 95
IEEE Robotics & Automation Magazine - March 2016 - 96
IEEE Robotics & Automation Magazine - March 2016 - 97
IEEE Robotics & Automation Magazine - March 2016 - 98
IEEE Robotics & Automation Magazine - March 2016 - 99
IEEE Robotics & Automation Magazine - March 2016 - 100
IEEE Robotics & Automation Magazine - March 2016 - 101
IEEE Robotics & Automation Magazine - March 2016 - 102
IEEE Robotics & Automation Magazine - March 2016 - 103
IEEE Robotics & Automation Magazine - March 2016 - 104
IEEE Robotics & Automation Magazine - March 2016 - 105
IEEE Robotics & Automation Magazine - March 2016 - 106
IEEE Robotics & Automation Magazine - March 2016 - 107
IEEE Robotics & Automation Magazine - March 2016 - 108
IEEE Robotics & Automation Magazine - March 2016 - 109
IEEE Robotics & Automation Magazine - March 2016 - 110
IEEE Robotics & Automation Magazine - March 2016 - 111
IEEE Robotics & Automation Magazine - March 2016 - 112
IEEE Robotics & Automation Magazine - March 2016 - 113
IEEE Robotics & Automation Magazine - March 2016 - 114
IEEE Robotics & Automation Magazine - March 2016 - 115
IEEE Robotics & Automation Magazine - March 2016 - 116
IEEE Robotics & Automation Magazine - March 2016 - 117
IEEE Robotics & Automation Magazine - March 2016 - 118
IEEE Robotics & Automation Magazine - March 2016 - 119
IEEE Robotics & Automation Magazine - March 2016 - 120
IEEE Robotics & Automation Magazine - March 2016 - Cover3
IEEE Robotics & Automation Magazine - March 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2010
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2010
https://www.nxtbookmedia.com