Systems, Man & Cybernetics - April 2016 - 30
where S is the number of all samples in S and S +
and S_ , respectively, denote the numbers of positiveclass and negative-class samples in S. When S = S +
or S = S_ , CE 2 ^ P h reaches the minimum of 0;
when S + = S_ , CE 2 ^ P h reaches the maximum of 1.
Similarly, classification entropy for a C-class problem
is defined as
◆ Nonspecificity. Nonspecificity is also known as ambi-
guity, which is another measure to evaluate the
uncertainty of fuzzy subset A = " n 1, n 2, f, n n , . The
non specificity or ambiguity of fuzzy subset A is
defined as
Ambig ^ A h = | 6^n *i - n *i + 1 h Ini@,
n
i=1
CE C ^ P h = - |
C
k=1
Sk
Sk
log 2
,
S
S
where S k is the number of the kth class samples in S.
◆ Fuzziness. Uncertainty always exists in our human
language, e.g., young and old. Then, what is the boundary between young and old? Fuzzy subsets are used to
measure this kind of uncertainty in human language.
For a universe U = " x 1, x 2, f, x n ,, a fuzzy subset A of
U is defined as
A = " n A ^ x 1 h, n A ^ x 2 h, f, n A ^ x n h,,
where n A, called the membership function of A,
is a mapping function from U to [0, 1]. Assume
there a re three fuzzy subsets: A 1 = " 0.7, 0.4, 0.1 ,,
A 2 = " 0.8, 0.2, 0 ,, and A 3 = " 1, 0, 0 , . Fuzziness is a
measure that can help us determine which one is more
fuzzy or less fuzzy. The definition of fuzziness of a
fuzzy subset A is
1
Fuzz ^ A h = - n | 6n A ^ x i h log 2 n A ^ x i h + 61 - n A ^ x i h@
n
i=1
# log 2 61 - n A ^ x i h@@ .
(0.9, 0.0, 0.1)
(0.8, 0.2, 0.0)
(0.1, 0.2, 0.7)
(0.1, 0.8, 0.1)
......
C1
C1
C3
C2
......
(0.6, 0.0, 0.4)
(0.7, 0.3, 0.0)
(0.0, 0.4, 0.6)
(0.4, 0.6, 0.0)
......
C1
C1
C3
C2
......
Training
Accuracy
Case 1
Case 2
Case 3
Case 4
......
Final
Training
Accuracy
A Big Data Set
or
Its Samples
(C1, C2, C3)
Classifier
A
Training
Cases
Classifier
B
Thus, we can know fuzzy subset A1 is the most fuzzy
because of Fuzz ^ A 1 h 2 Fuzz ^ A 2 h 2 Fuzz ^ A 3 h .
Figure 3. the general framework of uncertainty-
based learning for big data. Classifier a has the
same training accuracy as Classifier b, but Classifier
a has a smaller uncertainty (e.g., fuzziness or
ambiguity) than Classifier b. We say, for some types
of big data (not for all), Classifier a has the better
generalization than Classifier b, which provides quite
a different viewpoint to design the learning algorithm
in comparison to the traditional pattern-recognition
viewpoint.
30
IEEE SyStEmS, man, & CybErnEtICS magazInE A pri l 2016
where A * = " n *1, n *2, f, n *n , is a permutation of membership degree distribution of A such that for any i,
n *i # n *i + 1 and n *n + 1 = 0.
◆ Rough degree. For the rough set ^ R X, R X h of X, its
rough degree is defined as
RD ^ X h = 1 -
RX
,
RX
where R X = " x ! U 6x@R 3 X , is the lower approximation of X, R X = " x ! U 6x@R + X ! Q , is the upper
approximation of X, U is the universe of discourse, R is
an equivalence relation, X is a subset of U, and
6x@R = " y y ! U, yRx , is an equivalence class.
Some Studies on Learning
from Uncertainty for Big Data
Here, we briefly introduce two studies regarding uncertaintybased learning for big data. One is fuzziness-based semisupervised learning and the other is ambiguity-based model
tree (AMT) handling mixed attributes. The first study is basically within the following general framework [19] of uncertainty-based learning for big data (as shown in Figure 3).
Fuzziness-Based Semisupervised Learning
Assume that A is a big data set in which most cases have
no labels. B is a small part of A, and each case in B has a
label. We can train a classifier from B, but we cannot
expect a good prediction performance on A-B. Based on
the prediction of each case in A-B, we would like to select
some cases from A-B and then add them (together with
their predicted labels) into B. It is expected to have the
improved prediction accuracy on A-B after retraining on
B. Here, the key problems are what requirements the
trained classifier should meet and how to select cases
from A-B. Theoretically, the trained classifier is required
to have an accuracy of more than 0.5. We focus on the
sample selection strategy from uncertainty view as shown
in Algorithm 1.
It is highlighted in our learning scheme that, traditionally, only group G3 is mentioned for learning performance
improvement, while both G3 and G1 are used.
For demonstration, we collect a big data set for the Chinese chess game scene classification. The file size is
1.86 GB, including more than 107 records of playing a
chess game and more than 109 scenes of a chess game.
This is a typical semisupervised learning with unstructured
data: there are numerous scenes that need to be labeled.
Complicated scene labeling usually requires senior
Table of Contents for the Digital Edition of Systems, Man & Cybernetics - April 2016
Systems, Man & Cybernetics - April 2016 - Cover1
Systems, Man & Cybernetics - April 2016 - Cover2
Systems, Man & Cybernetics - April 2016 - 1
Systems, Man & Cybernetics - April 2016 - 2
Systems, Man & Cybernetics - April 2016 - 3
Systems, Man & Cybernetics - April 2016 - 4
Systems, Man & Cybernetics - April 2016 - 5
Systems, Man & Cybernetics - April 2016 - 6
Systems, Man & Cybernetics - April 2016 - 7
Systems, Man & Cybernetics - April 2016 - 8
Systems, Man & Cybernetics - April 2016 - 9
Systems, Man & Cybernetics - April 2016 - 10
Systems, Man & Cybernetics - April 2016 - 11
Systems, Man & Cybernetics - April 2016 - 12
Systems, Man & Cybernetics - April 2016 - 13
Systems, Man & Cybernetics - April 2016 - 14
Systems, Man & Cybernetics - April 2016 - 15
Systems, Man & Cybernetics - April 2016 - 16
Systems, Man & Cybernetics - April 2016 - 17
Systems, Man & Cybernetics - April 2016 - 18
Systems, Man & Cybernetics - April 2016 - 19
Systems, Man & Cybernetics - April 2016 - 20
Systems, Man & Cybernetics - April 2016 - 21
Systems, Man & Cybernetics - April 2016 - 22
Systems, Man & Cybernetics - April 2016 - 23
Systems, Man & Cybernetics - April 2016 - 24
Systems, Man & Cybernetics - April 2016 - 25
Systems, Man & Cybernetics - April 2016 - 26
Systems, Man & Cybernetics - April 2016 - 27
Systems, Man & Cybernetics - April 2016 - 28
Systems, Man & Cybernetics - April 2016 - 29
Systems, Man & Cybernetics - April 2016 - 30
Systems, Man & Cybernetics - April 2016 - 31
Systems, Man & Cybernetics - April 2016 - 32
Systems, Man & Cybernetics - April 2016 - 33
Systems, Man & Cybernetics - April 2016 - 34
Systems, Man & Cybernetics - April 2016 - 35
Systems, Man & Cybernetics - April 2016 - 36
Systems, Man & Cybernetics - April 2016 - 37
Systems, Man & Cybernetics - April 2016 - 38
Systems, Man & Cybernetics - April 2016 - 39
Systems, Man & Cybernetics - April 2016 - 40
Systems, Man & Cybernetics - April 2016 - 41
Systems, Man & Cybernetics - April 2016 - 42
Systems, Man & Cybernetics - April 2016 - 43
Systems, Man & Cybernetics - April 2016 - 44
Systems, Man & Cybernetics - April 2016 - 45
Systems, Man & Cybernetics - April 2016 - 46
Systems, Man & Cybernetics - April 2016 - 47
Systems, Man & Cybernetics - April 2016 - 48
Systems, Man & Cybernetics - April 2016 - 49
Systems, Man & Cybernetics - April 2016 - 50
Systems, Man & Cybernetics - April 2016 - 51
Systems, Man & Cybernetics - April 2016 - 52
Systems, Man & Cybernetics - April 2016 - 53
Systems, Man & Cybernetics - April 2016 - 54
Systems, Man & Cybernetics - April 2016 - 55
Systems, Man & Cybernetics - April 2016 - 56
Systems, Man & Cybernetics - April 2016 - Cover3
Systems, Man & Cybernetics - April 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/smc_202310
https://www.nxtbook.com/nxtbooks/ieee/smc_202307
https://www.nxtbook.com/nxtbooks/ieee/smc_202304
https://www.nxtbook.com/nxtbooks/ieee/smc_202301
https://www.nxtbook.com/nxtbooks/ieee/smc_202210
https://www.nxtbook.com/nxtbooks/ieee/smc_202207
https://www.nxtbook.com/nxtbooks/ieee/smc_202204
https://www.nxtbook.com/nxtbooks/ieee/smc_202201
https://www.nxtbook.com/nxtbooks/ieee/smc_202110
https://www.nxtbook.com/nxtbooks/ieee/smc_202107
https://www.nxtbook.com/nxtbooks/ieee/smc_202104
https://www.nxtbook.com/nxtbooks/ieee/smc_202101
https://www.nxtbook.com/nxtbooks/ieee/smc_202010
https://www.nxtbook.com/nxtbooks/ieee/smc_202007
https://www.nxtbook.com/nxtbooks/ieee/smc_202004
https://www.nxtbook.com/nxtbooks/ieee/smc_202001
https://www.nxtbook.com/nxtbooks/ieee/smc_201910
https://www.nxtbook.com/nxtbooks/ieee/smc_201907
https://www.nxtbook.com/nxtbooks/ieee/smc_201904
https://www.nxtbook.com/nxtbooks/ieee/smc_201901
https://www.nxtbook.com/nxtbooks/ieee/smc_201810
https://www.nxtbook.com/nxtbooks/ieee/smc_201807
https://www.nxtbook.com/nxtbooks/ieee/smc_201804
https://www.nxtbook.com/nxtbooks/ieee/smc_201801
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1017
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0717
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0417
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0117
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1016
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0716
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0416
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0116
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1015
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0715
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0415
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0115
https://www.nxtbookmedia.com