IEEE Systems, Man and Cybernetics Magazine - April 2020 - 29
Clustering Algorithms
Although the techniques described in the previous section
(sVAT, PSVAT, o-VAT, and so on) help us understand the
cluster structure of big data and determine the optimal
value of k (the number of clusters to seek based on the
visual evidence), they do not partition the data into k clusters. To address this, Havens et al. [79] extended the sVAT
algorithm so that it returned SL partitions of the big data
set. They named this algorithm sVAT-SL, which calculates
an SL partition of the sVAT-sampled data and then extends
this partition to the entire data set using a nearest (object)
prototype rule (NPR).
They showed sVAT-SL to be a scalable instantiation of SL
clustering for data sets that contain k compact-separated
clusters in the sense of Dunn's index. For data sets that do
not contain compact-separated clusters, sVAT-SL produces
a good approximation of SL partitions. Kumar et al. [80], [81]
renamed the sVAT-SL as the clusiVAT algorithm (algorithm S7 in "Pseudocode for Various Algorithms Belonging
to the Visual Assessment of Tendency Family"), and their
numerical experiments showed its superiority in terms of
cluster quality and computation time over several popular
big-data-clustering algorithms, such as MSTs built with the
Filter-Kruskal algorithm, k-means, single-pass k-means,
online k-means, and clustering using representatives.
It is quite fast to apply clusiVAT to a data set in which
the objects are represented by their feature vectors and
the Euclidean distance is used as a distance measure
between objects, since it was developed with the assumption that the distance function can be computed quickly
and can be performed as a batch operation; that is, the
Euclidean distance of a data point from M & 1 data points
can be computed as a single operation using matrix properties. This assumption, however, does not hold for many
distance measures applicable to graphs and time series
[83], [84]. There are many distance measures relevant to
problems in different domains that are computationally
expensive and can be computed in a pairwise manner
400
300
200
100
0
-100
-200
-300
-400
(b)
(c)
0
40
30
0
20
0
0
10
0
-500
-4
00
-3
00
-2
00
-1
00
N # N (unordered) dissimilarity matrix based on the criteria of selecting the most dissimilar value from the selected row of the dissimilarity matrix. Next, the VAT algorithm
is applied to the samples to obtain the RDI, which helps in
understanding the cluster structure of the big data. This
technique becomes time and memory intensive due to the
calculation of the N # N (unordered) dissimilarity matrix
for very large values of N.
A different sampling-based scheme to make VAT applicable to large-volume data sets, called out-of-core VAT (o-VAT ),
was proposed in [73]. The preprocessing step of o-VAT compresses the data set by merging data points into small
groups called buckets, each of which is then represented by
the mean of its contents. The process starts with a single
empty bucket and, based on the distance of the new point
from existing buckets being smaller or greater than some
user-defined threshold, called the confidence radius (|), it
is added to an existing bucket or placed in a new bucket of
its own. This process leads to a set of n buckets, which represents the complete data set in compressed form. The higher the value of |, the higher the number of buckets n, and
vice versa. The VAT algorithm is then applied to the set of n
buckets to infer the cluster structure of the big data.
Recently, Trang et al. [74] used a probabilistic traversing
sampling (ProTraS) algorithm [75], which is based on the farthest-first traversal principle, in which a representative is
selected that yields the highest probability of cost reduction.
Each point of the sample points obtained by ProTraS is then
replaced by the center of the set of patterns represented by
the point, thereby moving the sample toward the center of
the clusters. The VAT/iVAT image of the new sample is thus
sharper while the main structure of clusters is maintained.
Shao et al. [76] proposed a VAT-based end-to-end clustering
algorithm, called hybrid and parameter-free clustering method, which first samples the large data set using a fast version
of MMRS, which they dubbed MMRS*. A more general form of
this sampling scheme called nearly maximin sampling first
appeared in [77]. Subsequently, this method was used in [78],
where it was called MMRS plus (MMRS+) and used as a basis
for the siVAT+ algorithm. Shao et al. [76] used the same
improvement to MMRS, followed by a different visual assessment image built by eVAT. The method, then, determines the
number of clusters using the extraction of pixel blocks, forms
different partitions (MST tree cutting), and extends the results
to the rest of the data set.
(a)
(d)
Figure 9. The data scatterplot and VAT, sVAT, and siVAT
images for big data Gaussian clusters. The (a) data set
of N = 1,000,000, (b) VAT for N = 1,000,000, (c) sVAT for
n = 500, and (d) siVAT for n = 500.
Ap ri l 2020
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
29
IEEE Systems, Man and Cybernetics Magazine - April 2020
Table of Contents for the Digital Edition of IEEE Systems, Man and Cybernetics Magazine - April 2020
Contents
IEEE Systems, Man and Cybernetics Magazine - April 2020 - Cover1
IEEE Systems, Man and Cybernetics Magazine - April 2020 - Cover2
IEEE Systems, Man and Cybernetics Magazine - April 2020 - Contents
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 2
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 3
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 4
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 5
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 6
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 7
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 8
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 9
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 10
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 11
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 12
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 13
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 14
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 15
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 16
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 17
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 18
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 19
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 20
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 21
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 22
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 23
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 24
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 25
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 26
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 27
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 28
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 29
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 30
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 31
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 32
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 33
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 34
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 35
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 36
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 37
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 38
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 39
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 40
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 41
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 42
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 43
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 44
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 45
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 46
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 47
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 48
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 49
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 50
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 51
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 52
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 53
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 54
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 55
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 56
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 57
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 58
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 59
IEEE Systems, Man and Cybernetics Magazine - April 2020 - 60
IEEE Systems, Man and Cybernetics Magazine - April 2020 - Cover3
IEEE Systems, Man and Cybernetics Magazine - April 2020 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/smc_202310
https://www.nxtbook.com/nxtbooks/ieee/smc_202307
https://www.nxtbook.com/nxtbooks/ieee/smc_202304
https://www.nxtbook.com/nxtbooks/ieee/smc_202301
https://www.nxtbook.com/nxtbooks/ieee/smc_202210
https://www.nxtbook.com/nxtbooks/ieee/smc_202207
https://www.nxtbook.com/nxtbooks/ieee/smc_202204
https://www.nxtbook.com/nxtbooks/ieee/smc_202201
https://www.nxtbook.com/nxtbooks/ieee/smc_202110
https://www.nxtbook.com/nxtbooks/ieee/smc_202107
https://www.nxtbook.com/nxtbooks/ieee/smc_202104
https://www.nxtbook.com/nxtbooks/ieee/smc_202101
https://www.nxtbook.com/nxtbooks/ieee/smc_202010
https://www.nxtbook.com/nxtbooks/ieee/smc_202007
https://www.nxtbook.com/nxtbooks/ieee/smc_202004
https://www.nxtbook.com/nxtbooks/ieee/smc_202001
https://www.nxtbook.com/nxtbooks/ieee/smc_201910
https://www.nxtbook.com/nxtbooks/ieee/smc_201907
https://www.nxtbook.com/nxtbooks/ieee/smc_201904
https://www.nxtbook.com/nxtbooks/ieee/smc_201901
https://www.nxtbook.com/nxtbooks/ieee/smc_201810
https://www.nxtbook.com/nxtbooks/ieee/smc_201807
https://www.nxtbook.com/nxtbooks/ieee/smc_201804
https://www.nxtbook.com/nxtbooks/ieee/smc_201801
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1017
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0717
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0417
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0117
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1016
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0716
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0416
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0116
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1015
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0715
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0415
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0115
https://www.nxtbookmedia.com