Computational Intelligence - February 2016 - 24

given our data decomposes as follows (note that we ignore the
possible dependence of the prior p (w s) on x is or v 2 ):
p (w s|y si, x is, v 2) ? p (y is|w s, x is, v 2) p (w s) .

(2)

With the model from the previous section and the assumption
of Gaussian noise, p (y is|w s, x is, v 2) + N (w Ts x is, v 2), and assuming our samples x is are independent, we may derive the negative log likelihood as follows:
p (y s1, ..., y ns s|x s1, ..., x sns, w s, v 2) =

ns

% N (y is; w Ts x is, v 2)

i= 1

LL (w s; D s, v 2) = 12

ns

/ ^y is - w Ts x sih .
2

v i= 1

(3)
(4)

The negative log likelihood defines a convenient loss function
as its value increases with the square of the difference between
our prediction w Ts x si and the true label y is for each data point.
For notational convenience, we write the loss in matrix form
by defining the input matrix X = [x 1T, f, x nT] T and the output vector y = [y 1, f, y n] T . Then, the loss for subject/session
s is given by X s w s - y s 2, where v is the , 2 or Euclidean
norm. If we ignore the prior and solve for w s analytically from
here, we end up with the equations for regular linear regression.
It is well known that complex models that are trained without a validation dataset can over-fit, leading to poor generalization
to new data points. A classical technique to control over-fitting is
adding a penalty term to the loss function that reduces the complexity of the model. A common choice for this regularizer is
given by the sum of the squares of the weight parameters,
X ( w s) =

ws
2

2

.

(5)

Addition of X (w s) to the optimization problem is equivalent
to assuming a Gaussian prior on w s with 0 mean and unit
covariance I and incorporating this prior in the log-scale.1 If
the variance of the prior is not assumed to be exactly the identity matrix but rather some matrix aI then this formulation
describes ridge regression.
However, the above assumption is rarely a reasonable one. If
there exists some better prior information on the distribution
of the weights that can be represented by a mean n and covariance R, this information can be used instead in the regularizer by assuming a Gaussian distribution with the corresponding mean and covariance term, N ^ n, R h, as the prior and
defining the regularizer as the negative log prior probability
X (w s; n, R) = 1 6(w s - n) T R - 1 (w s - n)@ + 1 log det (R) .
2
2
(6)
Note that the last term is constant with respect to w s for fixed
R, and further that X (w s; 0, I) is equivalent to (5).
The new loss function can then be derived by taking the
negative logarithm of the posterior of y s:
1

2

Note that LL (w s; D s, v ) + X (w s) gives the negative log posterior for w s given D s
and the assumed prior.

24

IEEE ComputatIonal IntEllIgEnCE magazInE | FEbruary 2016

(7)
p (y is|w s, x is, n, R, m) ? N (y is; w Ts x si , m) N (w s; n, R)
1
2
LP (w s; D s, n, R, m) =
X s w s - y s + X (w s; n, R) + C. (8)
m

We replace v 2 with m to emphasize that in the loss function,
the variance of the original noise model is equivalent to a term
that controls the ratio of the importance assigned to the prior
probability of the learned weight vector versus how well the
learned vector can predict the labels in the training data. Put
another way, the higher the variance of the noise in the model,
the less we can trust our training data to lead us to a good solution; moving forwards, it is more convenient to think of the
variable in terms of this trade-off than as purely a noise variance. From this point the actual optimization problem can be
formulated as
min
LP (w s; D s, n, R, m) .
ws

(9)

2.3. Training Models for Subjects/Sessions Jointly

In a standard machine learning setting, there is a single prediction problem or task to model and there is usually no prior
information on the distribution of the model parameters w.
However, if there are multiple prediction tasks that are related
to each other, it is possible to use information from all the tasks
in order to improve the inferred model of each task. In particular, if the tasks share a common structure along with some taskspecific variations, the shared structure can be used as the prior
information (n, R) in (6) in order to ensure that the solutions
to all the tasks are close to each other in some space.
In the BCI training problem, we treat each subject/session
as one task and the shared structure ^ n, R h represents the subject/session-invariant characteristics of stimulus prediction.
More precisely, ^ n, R h are the mean vector and covariance
matrix of features. As such, n defines an out-of-the-box BCI
that can be used to classify data recorded from a novel subject/
session without any subject/session-specific calibration process.
The divergence of a subject/session model from the shared
structure, w s - n , represents the subject/session-specific
characteristics of the stimulus prediction.
Clearly, the shared structure is unknown in this setting. Our
goal is to infer the shared structure, ^ n, R h, from all the tasks
along with the model parameters w s jointly. This can be
achieved by combining the optimization problem of all tasks
min LP (W, n, R; D, m) = min 1 / X s w s - y s

W, n, R

W, n, R

2

m s

+ / X (w s ; n , R )

(10)

s

where W = [w 1, ..., w S] T, D = {D s} Ss = 1, and d is the dimension of each weight vector. Let us investigate each term of this
optimization problem separately. The first term is the sum of
the losses from each session, and by minimizing it we ensure all
the sessions are well fitted. The second term controls the divergence of each subject/session model from the underlying mean
vector n and penalizes the elements of the residual
t s = w s - n scaling with R - 1 . Expanding one of these terms,
w

Table of Contents for the Digital Edition of Computational Intelligence - February 2016

Computational Intelligence - February 2016 - Cover1
Computational Intelligence - February 2016 - Cover2
Computational Intelligence - February 2016 - 1
Computational Intelligence - February 2016 - 2
Computational Intelligence - February 2016 - 3
Computational Intelligence - February 2016 - 4
Computational Intelligence - February 2016 - 5
Computational Intelligence - February 2016 - 6
Computational Intelligence - February 2016 - 7
Computational Intelligence - February 2016 - 8
Computational Intelligence - February 2016 - 9
Computational Intelligence - February 2016 - 10
Computational Intelligence - February 2016 - 11
Computational Intelligence - February 2016 - 12
Computational Intelligence - February 2016 - 13
Computational Intelligence - February 2016 - 14
Computational Intelligence - February 2016 - 15
Computational Intelligence - February 2016 - 16
Computational Intelligence - February 2016 - 17
Computational Intelligence - February 2016 - 18
Computational Intelligence - February 2016 - 19
Computational Intelligence - February 2016 - 20
Computational Intelligence - February 2016 - 21
Computational Intelligence - February 2016 - 22
Computational Intelligence - February 2016 - 23
Computational Intelligence - February 2016 - 24
Computational Intelligence - February 2016 - 25
Computational Intelligence - February 2016 - 26
Computational Intelligence - February 2016 - 27
Computational Intelligence - February 2016 - 28
Computational Intelligence - February 2016 - 29
Computational Intelligence - February 2016 - 30
Computational Intelligence - February 2016 - 31
Computational Intelligence - February 2016 - 32
Computational Intelligence - February 2016 - 33
Computational Intelligence - February 2016 - 34
Computational Intelligence - February 2016 - 35
Computational Intelligence - February 2016 - 36
Computational Intelligence - February 2016 - 37
Computational Intelligence - February 2016 - 38
Computational Intelligence - February 2016 - 39
Computational Intelligence - February 2016 - 40
Computational Intelligence - February 2016 - 41
Computational Intelligence - February 2016 - 42
Computational Intelligence - February 2016 - 43
Computational Intelligence - February 2016 - 44
Computational Intelligence - February 2016 - 45
Computational Intelligence - February 2016 - 46
Computational Intelligence - February 2016 - 47
Computational Intelligence - February 2016 - 48
Computational Intelligence - February 2016 - 49
Computational Intelligence - February 2016 - 50
Computational Intelligence - February 2016 - 51
Computational Intelligence - February 2016 - 52
Computational Intelligence - February 2016 - 53
Computational Intelligence - February 2016 - 54
Computational Intelligence - February 2016 - 55
Computational Intelligence - February 2016 - 56
Computational Intelligence - February 2016 - 57
Computational Intelligence - February 2016 - 58
Computational Intelligence - February 2016 - 59
Computational Intelligence - February 2016 - 60
Computational Intelligence - February 2016 - 61
Computational Intelligence - February 2016 - 62
Computational Intelligence - February 2016 - 63
Computational Intelligence - February 2016 - 64
Computational Intelligence - February 2016 - 65
Computational Intelligence - February 2016 - 66
Computational Intelligence - February 2016 - 67
Computational Intelligence - February 2016 - 68
Computational Intelligence - February 2016 - 69
Computational Intelligence - February 2016 - 70
Computational Intelligence - February 2016 - 71
Computational Intelligence - February 2016 - 72
Computational Intelligence - February 2016 - Cover3
Computational Intelligence - February 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202311
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202308
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202305
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202302
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202211
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202208
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202205
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202202
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202111
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202108
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202105
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202102
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202011
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202008
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202005
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202002
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201911
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201908
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201905
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201902
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201811
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201808
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201805
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201802
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter12
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall12
https://www.nxtbookmedia.com