Instrumentation & Measurement Magazine 24-2 - 120

evader along the ℓ-direction, while K k   13 represents the
control gains which are computed in real-time by the RL
algorithm.
One of the salient features of this setup is its flexibility to
adapt to different problems of various complexities. For instance, for the application in hand, it was found sufficient to
only include the last three tracking errors in Ek. However, it
is possible to add more error signals or remove some without
having to redesign the control algorithm. The same remark
is applicable for modeling the incremental difference in the
surge speed and the orientation angle. In (1a) and (1b), such
signals are shaped as vk1  vk  (ukv  ukv1 ) and  k1   k 
uk ,
respectively. Changing them to some other forms, including
(ukv , ukv1 , ukv2 ,), and (uk , uk1 ,), for example, does not require
the restructure of the algorithm. This exhibits the flexibility
inherited in the learning system where, using only few selfadapted control gains, a filtering behavior can be obtained
during the online learning process.

Temporal Difference Solution
For the RL framework to provide a solution for the pursuer-evader optimization problem, we need to derive the
necessary temporal difference equations (i.e., Bellman
equations) along with the required optimality conditions.
The ultimate objective is to converge to the best policies
ukv* and uk * for each pursuer in order to intercept the evader
ship. Hence, two convex objective functions are considered (each corresponds to the motion in one of the x-y
directions)
	

1  T  



U E ,u
 E Q Ek  R uk
2 k





k


k



 

2







J k   U  E i , ui 	(3)
ik

Taking advantage of the policy expression (2), the optimization process aims at minimizing the performance index J k by
optimizing the gain vector K k. This iterative process may lead
the obtained policy to practically converge towards the optimal policy uk*  arg minu J k , although it may not reach its
k
theoretical value.
A common problem in RL is that J k cannot be calculated.
Motivated by the structure of the cost function U , J k is approximated by a value function V  that is only dependent on two
time-variant signals Ek and uk.

 

	

J k  V  ( E k , uk )	(4)

From (3) and (4), the temporal difference equation is
formed as
	
120	









	




V
( Ek , uk ) U  E k , uk  V  Ek1 , uk1 	(5)

V  ( Ek , uk ) 

E 
1 T T
[ Ek uk ] H   k  ,	
2
 uk 

where H  is a symmetric positive definite matrix with the following block structure
H 
H    EE

 H uE

	



H Eu
	
 
H uu


Note that the structure of the solving value function plays
an important role in how a RL algorithm decides on the best
strategies [14]. With the chosen quadratic structure, the pursuer-evader optimization problem is reduced to finding the
optimal solving value functions, by optimizing H , and consequently computing the optimal control gain K k* using Bellman
optimality principles [11]. To this end, we can conclude that the
optimal policy is
	

* 1
*
) H uE
uk*  arg min V  ( E k , uk )  ( H uu
E k	(6)
uk

Relations (2) and (6) yield the optimal gain
 * 1
*
K *  ( H uu
) H uE
	

	

Applying the optimal policy (6) in (5) yields the following
Bellman optimality equation
	

	



where   { x , y}, 0  Q    33 (positive definite) and 0  R  .
The quality of the applied control strategies is assessed through
a performance measure
	

As with the cost function U , the value function is set as a
quadratic function in the error vector Ek and the control signal uk, such that





*
V
( E k , uk* ) U * E k , uk*  V * ( E k1 , uk* 1 )	(7)

Online Reinforcement Learning
Solution
In most real-world scenarios, it is impossible to find theoretical solutions for (6) and (7). To alleviate this problem, an online
Value Iteration algorithm is employed herein to iteratively approximate such solutions.

Online Value Iteration
Value Iteration is a two-step technique that is used to find the
solutions of many temporal difference forms, such as the Bellman optimality equations (7). In the first step, the solving
value function is updated as
	





 ( r 1)
V
( E k , uk ) U  ( r ) Ek , uk  V  ( r ) ( E k1 , uk1 )	

where r is the value-update index. The second step uses such
update to suggest a new (improved) strategy, given by
	

 1


uk ( r 1)   ( H uu
) H uE


( r 1)

Ek .	

The algorithm cycles through this process until convergence is achieved. The Value Iteration technique is generally
proven to converge. It yields a bounded sequence of non-decreasing solving value functions if the initial solving value

IEEE Instrumentation & Measurement Magazine	

April 2021



Instrumentation & Measurement Magazine 24-2

Table of Contents for the Digital Edition of Instrumentation & Measurement Magazine 24-2

No label
Instrumentation & Measurement Magazine 24-2 - No label
Instrumentation & Measurement Magazine 24-2 - Cover2
Instrumentation & Measurement Magazine 24-2 - 1
Instrumentation & Measurement Magazine 24-2 - 2
Instrumentation & Measurement Magazine 24-2 - 3
Instrumentation & Measurement Magazine 24-2 - 4
Instrumentation & Measurement Magazine 24-2 - 5
Instrumentation & Measurement Magazine 24-2 - 6
Instrumentation & Measurement Magazine 24-2 - 7
Instrumentation & Measurement Magazine 24-2 - 8
Instrumentation & Measurement Magazine 24-2 - 9
Instrumentation & Measurement Magazine 24-2 - 10
Instrumentation & Measurement Magazine 24-2 - 11
Instrumentation & Measurement Magazine 24-2 - 12
Instrumentation & Measurement Magazine 24-2 - 13
Instrumentation & Measurement Magazine 24-2 - 14
Instrumentation & Measurement Magazine 24-2 - 15
Instrumentation & Measurement Magazine 24-2 - 16
Instrumentation & Measurement Magazine 24-2 - 17
Instrumentation & Measurement Magazine 24-2 - 18
Instrumentation & Measurement Magazine 24-2 - 19
Instrumentation & Measurement Magazine 24-2 - 20
Instrumentation & Measurement Magazine 24-2 - 21
Instrumentation & Measurement Magazine 24-2 - 22
Instrumentation & Measurement Magazine 24-2 - 23
Instrumentation & Measurement Magazine 24-2 - 24
Instrumentation & Measurement Magazine 24-2 - 25
Instrumentation & Measurement Magazine 24-2 - 26
Instrumentation & Measurement Magazine 24-2 - 27
Instrumentation & Measurement Magazine 24-2 - 28
Instrumentation & Measurement Magazine 24-2 - 29
Instrumentation & Measurement Magazine 24-2 - 30
Instrumentation & Measurement Magazine 24-2 - 31
Instrumentation & Measurement Magazine 24-2 - 32
Instrumentation & Measurement Magazine 24-2 - 33
Instrumentation & Measurement Magazine 24-2 - 34
Instrumentation & Measurement Magazine 24-2 - 35
Instrumentation & Measurement Magazine 24-2 - 36
Instrumentation & Measurement Magazine 24-2 - 37
Instrumentation & Measurement Magazine 24-2 - 38
Instrumentation & Measurement Magazine 24-2 - 39
Instrumentation & Measurement Magazine 24-2 - 40
Instrumentation & Measurement Magazine 24-2 - 41
Instrumentation & Measurement Magazine 24-2 - 42
Instrumentation & Measurement Magazine 24-2 - 43
Instrumentation & Measurement Magazine 24-2 - 44
Instrumentation & Measurement Magazine 24-2 - 45
Instrumentation & Measurement Magazine 24-2 - 46
Instrumentation & Measurement Magazine 24-2 - 47
Instrumentation & Measurement Magazine 24-2 - 48
Instrumentation & Measurement Magazine 24-2 - 49
Instrumentation & Measurement Magazine 24-2 - 50
Instrumentation & Measurement Magazine 24-2 - 51
Instrumentation & Measurement Magazine 24-2 - 52
Instrumentation & Measurement Magazine 24-2 - 53
Instrumentation & Measurement Magazine 24-2 - 54
Instrumentation & Measurement Magazine 24-2 - 55
Instrumentation & Measurement Magazine 24-2 - 56
Instrumentation & Measurement Magazine 24-2 - 57
Instrumentation & Measurement Magazine 24-2 - 58
Instrumentation & Measurement Magazine 24-2 - 59
Instrumentation & Measurement Magazine 24-2 - 60
Instrumentation & Measurement Magazine 24-2 - 61
Instrumentation & Measurement Magazine 24-2 - 62
Instrumentation & Measurement Magazine 24-2 - 63
Instrumentation & Measurement Magazine 24-2 - 64
Instrumentation & Measurement Magazine 24-2 - 65
Instrumentation & Measurement Magazine 24-2 - 66
Instrumentation & Measurement Magazine 24-2 - 67
Instrumentation & Measurement Magazine 24-2 - 68
Instrumentation & Measurement Magazine 24-2 - 69
Instrumentation & Measurement Magazine 24-2 - 70
Instrumentation & Measurement Magazine 24-2 - 71
Instrumentation & Measurement Magazine 24-2 - 72
Instrumentation & Measurement Magazine 24-2 - 73
Instrumentation & Measurement Magazine 24-2 - 74
Instrumentation & Measurement Magazine 24-2 - 75
Instrumentation & Measurement Magazine 24-2 - 76
Instrumentation & Measurement Magazine 24-2 - 77
Instrumentation & Measurement Magazine 24-2 - 78
Instrumentation & Measurement Magazine 24-2 - 79
Instrumentation & Measurement Magazine 24-2 - 80
Instrumentation & Measurement Magazine 24-2 - 81
Instrumentation & Measurement Magazine 24-2 - 82
Instrumentation & Measurement Magazine 24-2 - 83
Instrumentation & Measurement Magazine 24-2 - 84
Instrumentation & Measurement Magazine 24-2 - 85
Instrumentation & Measurement Magazine 24-2 - 86
Instrumentation & Measurement Magazine 24-2 - 87
Instrumentation & Measurement Magazine 24-2 - 88
Instrumentation & Measurement Magazine 24-2 - 89
Instrumentation & Measurement Magazine 24-2 - 90
Instrumentation & Measurement Magazine 24-2 - 91
Instrumentation & Measurement Magazine 24-2 - 92
Instrumentation & Measurement Magazine 24-2 - 93
Instrumentation & Measurement Magazine 24-2 - 94
Instrumentation & Measurement Magazine 24-2 - 95
Instrumentation & Measurement Magazine 24-2 - 96
Instrumentation & Measurement Magazine 24-2 - 97
Instrumentation & Measurement Magazine 24-2 - 98
Instrumentation & Measurement Magazine 24-2 - 99
Instrumentation & Measurement Magazine 24-2 - 100
Instrumentation & Measurement Magazine 24-2 - 101
Instrumentation & Measurement Magazine 24-2 - 102
Instrumentation & Measurement Magazine 24-2 - 103
Instrumentation & Measurement Magazine 24-2 - 104
Instrumentation & Measurement Magazine 24-2 - 105
Instrumentation & Measurement Magazine 24-2 - 106
Instrumentation & Measurement Magazine 24-2 - 107
Instrumentation & Measurement Magazine 24-2 - 108
Instrumentation & Measurement Magazine 24-2 - 109
Instrumentation & Measurement Magazine 24-2 - 110
Instrumentation & Measurement Magazine 24-2 - 111
Instrumentation & Measurement Magazine 24-2 - 112
Instrumentation & Measurement Magazine 24-2 - 113
Instrumentation & Measurement Magazine 24-2 - 114
Instrumentation & Measurement Magazine 24-2 - 115
Instrumentation & Measurement Magazine 24-2 - 116
Instrumentation & Measurement Magazine 24-2 - 117
Instrumentation & Measurement Magazine 24-2 - 118
Instrumentation & Measurement Magazine 24-2 - 119
Instrumentation & Measurement Magazine 24-2 - 120
Instrumentation & Measurement Magazine 24-2 - 121
Instrumentation & Measurement Magazine 24-2 - 122
Instrumentation & Measurement Magazine 24-2 - 123
Instrumentation & Measurement Magazine 24-2 - 124
Instrumentation & Measurement Magazine 24-2 - 125
Instrumentation & Measurement Magazine 24-2 - 126
Instrumentation & Measurement Magazine 24-2 - 127
Instrumentation & Measurement Magazine 24-2 - 128
Instrumentation & Measurement Magazine 24-2 - 129
Instrumentation & Measurement Magazine 24-2 - 130
Instrumentation & Measurement Magazine 24-2 - 131
Instrumentation & Measurement Magazine 24-2 - 132
Instrumentation & Measurement Magazine 24-2 - Cover3
Instrumentation & Measurement Magazine 24-2 - Cover4
https://www.nxtbook.com/allen/iamm/26-6
https://www.nxtbook.com/allen/iamm/26-5
https://www.nxtbook.com/allen/iamm/26-4
https://www.nxtbook.com/allen/iamm/26-3
https://www.nxtbook.com/allen/iamm/26-2
https://www.nxtbook.com/allen/iamm/26-1
https://www.nxtbook.com/allen/iamm/25-9
https://www.nxtbook.com/allen/iamm/25-8
https://www.nxtbook.com/allen/iamm/25-7
https://www.nxtbook.com/allen/iamm/25-6
https://www.nxtbook.com/allen/iamm/25-5
https://www.nxtbook.com/allen/iamm/25-4
https://www.nxtbook.com/allen/iamm/25-3
https://www.nxtbook.com/allen/iamm/instrumentation-measurement-magazine-25-2
https://www.nxtbook.com/allen/iamm/25-1
https://www.nxtbook.com/allen/iamm/24-9
https://www.nxtbook.com/allen/iamm/24-7
https://www.nxtbook.com/allen/iamm/24-8
https://www.nxtbook.com/allen/iamm/24-6
https://www.nxtbook.com/allen/iamm/24-5
https://www.nxtbook.com/allen/iamm/24-4
https://www.nxtbook.com/allen/iamm/24-3
https://www.nxtbook.com/allen/iamm/24-2
https://www.nxtbook.com/allen/iamm/24-1
https://www.nxtbook.com/allen/iamm/23-9
https://www.nxtbook.com/allen/iamm/23-8
https://www.nxtbook.com/allen/iamm/23-6
https://www.nxtbook.com/allen/iamm/23-5
https://www.nxtbook.com/allen/iamm/23-2
https://www.nxtbook.com/allen/iamm/23-3
https://www.nxtbook.com/allen/iamm/23-4
https://www.nxtbookmedia.com