IEEE Computational Intelligence Magazine - August 2018 - 48

Haibo He
Department of Electrical, Computer and Biomedical
Engineering, University of Rhode Island, Kingston, RI, USA
Xiangnan Zhong
Department of Electrical Engineering,
University of North Texas, Denton, TX, USA

Learning Without External Reward

Abstract

I

n the traditional reinforcement
learning paradigm, a reward signal is
applied to define the goal of the
task. Usually, the reward signal is a
"hand-crafted" numerical value or a
pre-defined function: it tells the agent
how good or bad a specific action is.
However, we believe there exist situations in which the environment cannot
directly provide such a reward signal to
the agent. Therefore, the question is
whether an agent can still learn without
the external reward signal or
not. To this end, this article
develops a self-learning ap proach which enables the
agent to adaptively develop an
internal reward signal based on
a given ultimate goal, without
requiring an explicit external
reward signal from the environment. In this article, we aim
to convey the self-learning
idea in a broad sense, which
could be used in a wide range
of existing reinforcement
learning and adaptive dynamic
programming algorithms and architectures. We describe the idealized forms of
this method mathematically, and also
demonstrate its effectiveness through a
triple-link inverted pendulum case study.

that we learn by interacting with the
environment [1]-[3]. For instance, when
we try to hold a conversation with others, we need to decide what to say based
on the people we are talking to as well as
the conversational context. Over the past
several decades, many researchers have
explored computational approaches to
learn from active interactions with the
environment, such as reinforcement
learning (RL) and adaptive dynamic programming (ADP). Imagine we hope to
train a monkey to learn the result of

image licensed by ingram publishing

When we first think about the nature of
learning, we probably start with the idea

"1+1". At first, we present two cards with
number "1" on them to the monkey: If it
picks a card with the number "2" on it
from a box, we present a banana as a
reward. In this way, although the monkey
does not know the exact meaning of
math, it knows that a banana can be
given when it picks the appropriate card.
Therefore, the banana reward plays an

Digital Object Identifier 10.1109/MCI.2018.2840727
Date of publication: 18 July 2018

Corresponding Author: Haibo He (Email: haibohe@uri.edu)

I. Introduction

48

IEEE ComputatIonal IntEllIgEnCE magazInE | auguSt 2018

important role in the learning process. In
general, the key element of RL is defined
by the reward signal, which is given by
the environment [1], [4]-[6]. In order to
achieve goals, the agent chooses a set of
actions that maximize the expected total
rewards it receives over time. Therefore,
RL achieves goals by defining the
interaction between an agent and its
environment in terms of states, actions,
and rewards [1], [7]. Recently, the
development of deep RL [8]-[10] has
attracted increasing attention, especially
for the level of intelligence it
has achieved.
So far, many RL/ADP
designs focus on how to calculate and maximize the cumulative rewards [11]-[15]. Usually,
it is assumed that the agent
knows what the immediate
reward is or how the immediate reward is computed as a
function of the actions and
states in which they are taken
[16]. There are several ap proaches in the literature to
define such a reward signal. For
instance, a typical approach is to use a
binary signal, e.g., using a "0" or "−1" to
represent "success" or "failure" of an
action [17], or a semi-binary reward signal, e.g., using "0, −0.4, −1" as a more
informative representation [18]. Another
way to define the reward signal is to use
a quadratic function based on the system
states and actions [19]-[22]. This type of

1556-603x/18©2018IEEE



Table of Contents for the Digital Edition of IEEE Computational Intelligence Magazine - August 2018

Contents
IEEE Computational Intelligence Magazine - August 2018 - Cover1
IEEE Computational Intelligence Magazine - August 2018 - Cover2
IEEE Computational Intelligence Magazine - August 2018 - Contents
IEEE Computational Intelligence Magazine - August 2018 - 2
IEEE Computational Intelligence Magazine - August 2018 - 3
IEEE Computational Intelligence Magazine - August 2018 - 4
IEEE Computational Intelligence Magazine - August 2018 - 5
IEEE Computational Intelligence Magazine - August 2018 - 6
IEEE Computational Intelligence Magazine - August 2018 - 7
IEEE Computational Intelligence Magazine - August 2018 - 8
IEEE Computational Intelligence Magazine - August 2018 - 9
IEEE Computational Intelligence Magazine - August 2018 - 10
IEEE Computational Intelligence Magazine - August 2018 - 11
IEEE Computational Intelligence Magazine - August 2018 - 12
IEEE Computational Intelligence Magazine - August 2018 - 13
IEEE Computational Intelligence Magazine - August 2018 - 14
IEEE Computational Intelligence Magazine - August 2018 - 15
IEEE Computational Intelligence Magazine - August 2018 - 16
IEEE Computational Intelligence Magazine - August 2018 - 17
IEEE Computational Intelligence Magazine - August 2018 - 18
IEEE Computational Intelligence Magazine - August 2018 - 19
IEEE Computational Intelligence Magazine - August 2018 - 20
IEEE Computational Intelligence Magazine - August 2018 - 21
IEEE Computational Intelligence Magazine - August 2018 - 22
IEEE Computational Intelligence Magazine - August 2018 - 23
IEEE Computational Intelligence Magazine - August 2018 - 24
IEEE Computational Intelligence Magazine - August 2018 - 25
IEEE Computational Intelligence Magazine - August 2018 - 26
IEEE Computational Intelligence Magazine - August 2018 - 27
IEEE Computational Intelligence Magazine - August 2018 - 28
IEEE Computational Intelligence Magazine - August 2018 - 29
IEEE Computational Intelligence Magazine - August 2018 - 30
IEEE Computational Intelligence Magazine - August 2018 - 31
IEEE Computational Intelligence Magazine - August 2018 - 32
IEEE Computational Intelligence Magazine - August 2018 - 33
IEEE Computational Intelligence Magazine - August 2018 - 34
IEEE Computational Intelligence Magazine - August 2018 - 35
IEEE Computational Intelligence Magazine - August 2018 - 36
IEEE Computational Intelligence Magazine - August 2018 - 37
IEEE Computational Intelligence Magazine - August 2018 - 38
IEEE Computational Intelligence Magazine - August 2018 - 39
IEEE Computational Intelligence Magazine - August 2018 - 40
IEEE Computational Intelligence Magazine - August 2018 - 41
IEEE Computational Intelligence Magazine - August 2018 - 42
IEEE Computational Intelligence Magazine - August 2018 - 43
IEEE Computational Intelligence Magazine - August 2018 - 44
IEEE Computational Intelligence Magazine - August 2018 - 45
IEEE Computational Intelligence Magazine - August 2018 - 46
IEEE Computational Intelligence Magazine - August 2018 - 47
IEEE Computational Intelligence Magazine - August 2018 - 48
IEEE Computational Intelligence Magazine - August 2018 - 49
IEEE Computational Intelligence Magazine - August 2018 - 50
IEEE Computational Intelligence Magazine - August 2018 - 51
IEEE Computational Intelligence Magazine - August 2018 - 52
IEEE Computational Intelligence Magazine - August 2018 - 53
IEEE Computational Intelligence Magazine - August 2018 - 54
IEEE Computational Intelligence Magazine - August 2018 - 55
IEEE Computational Intelligence Magazine - August 2018 - 56
IEEE Computational Intelligence Magazine - August 2018 - 57
IEEE Computational Intelligence Magazine - August 2018 - 58
IEEE Computational Intelligence Magazine - August 2018 - 59
IEEE Computational Intelligence Magazine - August 2018 - 60
IEEE Computational Intelligence Magazine - August 2018 - 61
IEEE Computational Intelligence Magazine - August 2018 - 62
IEEE Computational Intelligence Magazine - August 2018 - 63
IEEE Computational Intelligence Magazine - August 2018 - 64
IEEE Computational Intelligence Magazine - August 2018 - 65
IEEE Computational Intelligence Magazine - August 2018 - 66
IEEE Computational Intelligence Magazine - August 2018 - 67
IEEE Computational Intelligence Magazine - August 2018 - 68
IEEE Computational Intelligence Magazine - August 2018 - 69
IEEE Computational Intelligence Magazine - August 2018 - 70
IEEE Computational Intelligence Magazine - August 2018 - 71
IEEE Computational Intelligence Magazine - August 2018 - 72
IEEE Computational Intelligence Magazine - August 2018 - 73
IEEE Computational Intelligence Magazine - August 2018 - 74
IEEE Computational Intelligence Magazine - August 2018 - 75
IEEE Computational Intelligence Magazine - August 2018 - 76
IEEE Computational Intelligence Magazine - August 2018 - Cover3
IEEE Computational Intelligence Magazine - August 2018 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202311
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202308
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202305
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202302
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202211
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202208
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202205
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202202
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202111
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202108
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202105
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202102
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202011
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202008
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202005
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202002
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201911
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201908
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201905
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201902
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201811
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201808
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201805
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201802
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter12
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall12
https://www.nxtbookmedia.com