IEEE Robotics & Automation Magazine - June 2022 - 80

●
●
Step A: online RL of a control policy, using privileged information
Step
B: policy analysis and sample collection, using privileged
information
● Step C: offline supervised learning of the SE
●
Step D: online adaptation learning of the control policy to
the SE.
We rely on privileged information in the form of rich and
accurate state measurements, which are often available in a
lab setting via external sensing, such as motion capture. We
also rely on a means of automating sample collection with a
specified distribution. In the case of the Furuta pendulum,
state measurements are readily available from the joint encoders,
and samples can be easily gathered using a standard combination
of energy-pumping and linear-quadratic regulator
(LQR) controllers. An important benefit of automating sample
collection is that it makes it possible to quickly and easily
collect new data sets. As is always the case, development is an
iterative process, and speeding it up is critical, yet seldom discussed
in the literature [7].
Step A: Learning the Control Policy
To focus on a reliable and sample-efficient training process,
we train a Proximal Policy Optimization (PPO) [19] RL agent
using privileged information as input. In the case of the Furuta
platform, the agent learns to swing up and balance the pole
in approximately 12 h of interaction time, which is equivalent
to 8 h of samples gathered for learning and 4 h for resetting.
The entire process is automated and could be run in a single
session without any intervention.
To enable the agent to learn this task reliably, it was important
to tune the reward function and adjust hyperparameters
based on knowledge about the system. We use a continuous
reward function, which accelerates training by providing a
reliable, steady increase in the accumulated reward. For the
Furuta pendulum, we use a quadratic reward penalizing the
angle positions of the pendulum, with
rt
=- -cc
cm (1)
1
5
4
|| ||
180
tt
5
1
ai 2
180
.
We train the agent with a small learning rate and clipping
factor (see Table S1), which also helps to reliably increase the
reward across training episodes. Agents with a large learning
rate learned the swing-up task more quickly but were not able
to learn to balance the pendulum reliably: they were susceptible
to " fatal forgetting, " or sudden large drops in their reward.
We surmise that this is because balancing requires very precise
control inputs and therefore a smaller learning rate.
Step B: Policy Analysis and Data Collection
Based on the control policy trained on privileged information,
we empirically identify minimum precision requirements by
injecting noise into the state until the task can no longer be
fulfilled. This threshold is then used as the convergence criteria
for " Step C: Learning Precise SE. " For the Furuta pendulum,
we add zero-mean Gaussian noise to the angles and
80 * IEEE ROBOTICS & AUTOMATION MAGAZINE * JUNE 2022
propagate it via finite differences to the angular velocities. At
a sampling frequency of 120 Hz, the agent can tolerate noise
with a standard deviation of
1 1 .c We also noticed that this
level of precision is necessary only to balance the pendulum
near the equilibrium point; the policy is able to swing up the
pendulum even with higher noise. Based on this observation,
we separately collect data for the swing-up and balancing
portions of the task (see " Reproducible Platform " ). The convergence
criteria are then tested only on images relevant for
balancing, which we heuristically determined as ||
a 1 10 .c
As we will see in the " Step C: Learning Precise SE " section,
converging to high precision across the entire state space not
only requires more training time but a larger DNN.
Step C: Learning Precise SE
Precise predictions require an unknown minimum network
capacity, which makes it difficult to reduce the execution
time with limited computational resources. We balance this
tradeoff with a deliberate choice of the DNN architecture, a
biased data set, and data augmentation methods.
To increase precision, we simplify the learning task and
train a DNN by using standard convolutional layers to estimate
only the pose from a single image. Velocities are then
computed from a buffer of previously estimated positions
and velocities via finite differences and a first-order lowpass
filter (see Figure 3). This structure reduces the SE's
prediction error to roughly a fifth compared to a recurrent
neural network architecture of similar size, which we speculate
is due to the freed capacity being available for higher
accuracy on a simpler task. Alternatively, velocities could be
estimated by using a history of images as input, but again,
this would significantly increase the network size, which we
need to reduce as much as possible.
We also down-sample the input image from 540 × 720 to
220 × 220 pixels, which enables the DNN depth to be
increased; we found this was more important for precision
than a higher image resolution. To compensate for the downsampling,
we add a very small stride of one pixel per step.
With a depth of 12 layers, the SE reaches a precision that is
able to distinguish individual pixels.
Despite these measures, the limited network size makes
it difficult for the DNN to converge to a low error everywhere.
Precise state estimates are often not needed
throughout the entire state space, and we can evaluate
where the SE should be more precise based on the policy
analysis conducted in " Step B: Policy Analysis and Data
Collection. " For the Furuta pendulum, we bias the training
data set to be more densely sampled around the upper
equilibrium point. An SE trained on a very biased data set
can meet our convergence criteria after just four episodes
of training. Due to its reliably low prediction error for
small angles (refer to Figure 4), the RL agent could also
adapt much faster.
To avoid overfitting to the training data set, and to increase
the SE's robustness, we also apply data augmentation methods
[20] during training. The input images are randomly zoomed,

IEEE Robotics & Automation Magazine - June 2022

Table of Contents for the Digital Edition of IEEE Robotics & Automation Magazine - June 2022

Contents
IEEE Robotics & Automation Magazine - June 2022 - Cover1
IEEE Robotics & Automation Magazine - June 2022 - Cover2
IEEE Robotics & Automation Magazine - June 2022 - Contents
IEEE Robotics & Automation Magazine - June 2022 - 2
IEEE Robotics & Automation Magazine - June 2022 - 3
IEEE Robotics & Automation Magazine - June 2022 - 4
IEEE Robotics & Automation Magazine - June 2022 - 5
IEEE Robotics & Automation Magazine - June 2022 - 6
IEEE Robotics & Automation Magazine - June 2022 - 7
IEEE Robotics & Automation Magazine - June 2022 - 8
IEEE Robotics & Automation Magazine - June 2022 - 9
IEEE Robotics & Automation Magazine - June 2022 - 10
IEEE Robotics & Automation Magazine - June 2022 - 11
IEEE Robotics & Automation Magazine - June 2022 - 12
IEEE Robotics & Automation Magazine - June 2022 - 13
IEEE Robotics & Automation Magazine - June 2022 - 14
IEEE Robotics & Automation Magazine - June 2022 - 15
IEEE Robotics & Automation Magazine - June 2022 - 16
IEEE Robotics & Automation Magazine - June 2022 - 17
IEEE Robotics & Automation Magazine - June 2022 - 18
IEEE Robotics & Automation Magazine - June 2022 - 19
IEEE Robotics & Automation Magazine - June 2022 - 20
IEEE Robotics & Automation Magazine - June 2022 - 21
IEEE Robotics & Automation Magazine - June 2022 - 22
IEEE Robotics & Automation Magazine - June 2022 - 23
IEEE Robotics & Automation Magazine - June 2022 - 24
IEEE Robotics & Automation Magazine - June 2022 - 25
IEEE Robotics & Automation Magazine - June 2022 - 26
IEEE Robotics & Automation Magazine - June 2022 - 27
IEEE Robotics & Automation Magazine - June 2022 - 28
IEEE Robotics & Automation Magazine - June 2022 - 29
IEEE Robotics & Automation Magazine - June 2022 - 30
IEEE Robotics & Automation Magazine - June 2022 - 31
IEEE Robotics & Automation Magazine - June 2022 - 32
IEEE Robotics & Automation Magazine - June 2022 - 33
IEEE Robotics & Automation Magazine - June 2022 - 34
IEEE Robotics & Automation Magazine - June 2022 - 35
IEEE Robotics & Automation Magazine - June 2022 - 36
IEEE Robotics & Automation Magazine - June 2022 - 37
IEEE Robotics & Automation Magazine - June 2022 - 38
IEEE Robotics & Automation Magazine - June 2022 - 39
IEEE Robotics & Automation Magazine - June 2022 - 40
IEEE Robotics & Automation Magazine - June 2022 - 41
IEEE Robotics & Automation Magazine - June 2022 - 42
IEEE Robotics & Automation Magazine - June 2022 - 43
IEEE Robotics & Automation Magazine - June 2022 - 44
IEEE Robotics & Automation Magazine - June 2022 - 45
IEEE Robotics & Automation Magazine - June 2022 - 46
IEEE Robotics & Automation Magazine - June 2022 - 47
IEEE Robotics & Automation Magazine - June 2022 - 48
IEEE Robotics & Automation Magazine - June 2022 - 49
IEEE Robotics & Automation Magazine - June 2022 - 50
IEEE Robotics & Automation Magazine - June 2022 - 51
IEEE Robotics & Automation Magazine - June 2022 - 52
IEEE Robotics & Automation Magazine - June 2022 - 53
IEEE Robotics & Automation Magazine - June 2022 - 54
IEEE Robotics & Automation Magazine - June 2022 - 55
IEEE Robotics & Automation Magazine - June 2022 - 56
IEEE Robotics & Automation Magazine - June 2022 - 57
IEEE Robotics & Automation Magazine - June 2022 - 58
IEEE Robotics & Automation Magazine - June 2022 - 59
IEEE Robotics & Automation Magazine - June 2022 - 60
IEEE Robotics & Automation Magazine - June 2022 - 61
IEEE Robotics & Automation Magazine - June 2022 - 62
IEEE Robotics & Automation Magazine - June 2022 - 63
IEEE Robotics & Automation Magazine - June 2022 - 64
IEEE Robotics & Automation Magazine - June 2022 - 65
IEEE Robotics & Automation Magazine - June 2022 - 66
IEEE Robotics & Automation Magazine - June 2022 - 67
IEEE Robotics & Automation Magazine - June 2022 - 68
IEEE Robotics & Automation Magazine - June 2022 - 69
IEEE Robotics & Automation Magazine - June 2022 - 70
IEEE Robotics & Automation Magazine - June 2022 - 71
IEEE Robotics & Automation Magazine - June 2022 - 72
IEEE Robotics & Automation Magazine - June 2022 - 73
IEEE Robotics & Automation Magazine - June 2022 - 74
IEEE Robotics & Automation Magazine - June 2022 - 75
IEEE Robotics & Automation Magazine - June 2022 - 76
IEEE Robotics & Automation Magazine - June 2022 - 77
IEEE Robotics & Automation Magazine - June 2022 - 78
IEEE Robotics & Automation Magazine - June 2022 - 79
IEEE Robotics & Automation Magazine - June 2022 - 80
IEEE Robotics & Automation Magazine - June 2022 - 81
IEEE Robotics & Automation Magazine - June 2022 - 82
IEEE Robotics & Automation Magazine - June 2022 - 83
IEEE Robotics & Automation Magazine - June 2022 - 84
IEEE Robotics & Automation Magazine - June 2022 - 85
IEEE Robotics & Automation Magazine - June 2022 - 86
IEEE Robotics & Automation Magazine - June 2022 - 87
IEEE Robotics & Automation Magazine - June 2022 - 88
IEEE Robotics & Automation Magazine - June 2022 - 89
IEEE Robotics & Automation Magazine - June 2022 - 90
IEEE Robotics & Automation Magazine - June 2022 - 91
IEEE Robotics & Automation Magazine - June 2022 - 92
IEEE Robotics & Automation Magazine - June 2022 - 93
IEEE Robotics & Automation Magazine - June 2022 - 94
IEEE Robotics & Automation Magazine - June 2022 - 95
IEEE Robotics & Automation Magazine - June 2022 - 96
IEEE Robotics & Automation Magazine - June 2022 - 97
IEEE Robotics & Automation Magazine - June 2022 - 98
IEEE Robotics & Automation Magazine - June 2022 - 99
IEEE Robotics & Automation Magazine - June 2022 - 100
IEEE Robotics & Automation Magazine - June 2022 - 101
IEEE Robotics & Automation Magazine - June 2022 - 102
IEEE Robotics & Automation Magazine - June 2022 - 103
IEEE Robotics & Automation Magazine - June 2022 - 104
IEEE Robotics & Automation Magazine - June 2022 - Cover3
IEEE Robotics & Automation Magazine - June 2022 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2010
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2010
https://www.nxtbookmedia.com