Computational Intelligence - February 2014 - 39

0.
50

0.
45

0.
40

0.
35

0.
30

0.
25

0.
20

0.
15

0.
10

0.
05

0.
01

Average F-Score

available for cybercrime related
messages. Given the large num0.65
ber of un-tagged messages of
our corpora, we employed a
0.64
semi-automated method for tag
0.63
generation. Firstly, six annotators were responsible for tag0.62
ging around 10% of our mes0.61
sages. Secondly, the cosine
similar ities between tagged
0.6
messages and un-tagged mes0.59
2
sages were computed. Finally, an
4
68
un-tagged message was assigned
10
12
the tag of its most similar tagged
14
16
~Win
18
message if the cosine similarity
20
22
score was above a predefined
threshold (i.e., 0.5). For the
~IF
remaining messages without any
tag assigned, we annotated them Figure 5 Empirical parameter setting for context-sensitive term associations.
with the generic tag "cybercrime". Similar to the approach adopted by (Ramage et al.,
contextual information such as words preceding and after seed2011), we specified that each tag was associated with 5 topics
ing relationship indicators or user names, was also used. A stanfor the PLDA baseline system.
dard three-fold cross validation was applied to our experiments.
Another baseline systems (SEED) employed 11 seeding transFor the experimental system CSLDA, concepts representing
actional indicators, 16 seeding collaborative indicators, and the
transactional and collaborative cybercriminal relationships were
corresponding synonyms (top 3 synonyms of each seeding indifirst acquired via LDA-based latent topic modeling. The numcator) extracted from WordNet (Fellbaum, 1998) to identify
ber of latent concepts |Z| was estimated according to the percybercriminal relationships from messages. These relationship
plexity measure (Steyvers et al., 2004). Following the empirical
indicators were also used by the experimental system to infer the
finding of Griffiths and Steyvers (Steyvers et al., 2004), the
labels of mined latent concepts. The seeding relationship indicahyper-parameters a and b of the context-sensitive Gibbs samtors and the cybercrime corpora were stemmed using the same
pling algorithm was set to 50/|Z| and 0.1, respectively. Two
Porter stemming algorithm (Porter, 1980). The SEED baseline
independent Gibbs samples were used and the first three hunsystem simply uses a transactional (collaborative) strength measure
dreds Gibbs samples produced by our algorithm were ignored
to ensure that the burn-in period had been by-passed. The maxtran(CPi) = ^|TranInd| -|CollInd|h ^|TranInd| +|CollInd|h
imum loop control of Gibbs sampling Max I = 1, 000 was set
^coll(CPi) =^|CollInd|-|TranInd|h ^|TranInd|+|CollInd|hh
to ensure a proper convergence. The statistics of latent concepts
to determine the relationship label of a test message CPi, where
learning and Laplacian concept labeling of the experimental
TranInd and CollInd are the sets of transactional and collaborasystem are summarized in the second half of Table 1. For contive indicators found in the message. For example, if
text-sensitive Gibbs sampling, the virtual text window size ~ win
tran(CPi) > ~ tran (coll (CPi) > ~ coll) is true, the message is considered to be transactional (collaborative). The threshold
and the information flow quality threshold ~ IF for contextsensitive term associations extraction (Lau et al., 2008) were
~ tran (~ coll) was empirically established for our experiments. In
empirically established based on the Twitter corpus. We tried
addition, classical supervised machine learning classifiers such as
different combinations of ~ win and ~ IF as shown in Figure 5
Support Vector machine (SVM) with a RBF kernel4, and Conditional Random Fields (CRF)5 were also used. Stop word
while fixing the values of other system parameters. We found
that ~ win = 6, and ~ IF = 0.15 led to the best F-measure, and
removal, case transformation, and stemming were applied to the
cybercrime corpora before they were processed by the experithen we applied these parameter values to the experiments
mental and the baseline systems. For the SVM and CRF baseline
based on the forum corpus as well. Based on the selection
systems, word-based features and TFIDF term weighting were
parameter ~ IF = 0.15, there were 1, 512 and 1, 306 contextapplied. In addition, part-of-speech, number of seeding transacsensitive term associations extracted from the Twitter and the
tional indicators, number of seeding collaborative indicators, and
forum corpora, respectively.
lexical features such as sentence length and lexical diversity were
It should be noted that our empirical parameter setting
applied to the baseline systems. For the CRF baseline system,
method may not be able to identify the global optimum for
~ win, ~ IF, ~ rel, and other parameters. A more sophisticated
parameter
tuning method will only further improve the perfor4
http://www.csie.ntu.edu.tw/ cjlin/libsvm/
5
mance of our proposed computational method reported in this
http://crfpp.googlecode.com/svn/trunk/doc/index.html

February 2014 | Ieee ComputatIonal IntellIgenCe magazIne

39

Table of Contents for the Digital Edition of Computational Intelligence - February 2014

Computational Intelligence - February 2014 - Cover1
Computational Intelligence - February 2014 - Cover2
Computational Intelligence - February 2014 - 1
Computational Intelligence - February 2014 - 2
Computational Intelligence - February 2014 - 3
Computational Intelligence - February 2014 - 4
Computational Intelligence - February 2014 - 5
Computational Intelligence - February 2014 - 6
Computational Intelligence - February 2014 - 7
Computational Intelligence - February 2014 - 8
Computational Intelligence - February 2014 - 9
Computational Intelligence - February 2014 - 10
Computational Intelligence - February 2014 - 11
Computational Intelligence - February 2014 - 12
Computational Intelligence - February 2014 - 13
Computational Intelligence - February 2014 - 14
Computational Intelligence - February 2014 - 15
Computational Intelligence - February 2014 - 16
Computational Intelligence - February 2014 - 17
Computational Intelligence - February 2014 - 18
Computational Intelligence - February 2014 - 19
Computational Intelligence - February 2014 - 20
Computational Intelligence - February 2014 - 21
Computational Intelligence - February 2014 - 22
Computational Intelligence - February 2014 - 23
Computational Intelligence - February 2014 - 24
Computational Intelligence - February 2014 - 25
Computational Intelligence - February 2014 - 26
Computational Intelligence - February 2014 - 27
Computational Intelligence - February 2014 - 28
Computational Intelligence - February 2014 - 29
Computational Intelligence - February 2014 - 30
Computational Intelligence - February 2014 - 31
Computational Intelligence - February 2014 - 32
Computational Intelligence - February 2014 - 33
Computational Intelligence - February 2014 - 34
Computational Intelligence - February 2014 - 35
Computational Intelligence - February 2014 - 36
Computational Intelligence - February 2014 - 37
Computational Intelligence - February 2014 - 38
Computational Intelligence - February 2014 - 39
Computational Intelligence - February 2014 - 40
Computational Intelligence - February 2014 - 41
Computational Intelligence - February 2014 - 42
Computational Intelligence - February 2014 - 43
Computational Intelligence - February 2014 - 44
Computational Intelligence - February 2014 - 45
Computational Intelligence - February 2014 - 46
Computational Intelligence - February 2014 - 47
Computational Intelligence - February 2014 - 48
Computational Intelligence - February 2014 - 49
Computational Intelligence - February 2014 - 50
Computational Intelligence - February 2014 - 51
Computational Intelligence - February 2014 - 52
Computational Intelligence - February 2014 - 53
Computational Intelligence - February 2014 - 54
Computational Intelligence - February 2014 - 55
Computational Intelligence - February 2014 - 56
Computational Intelligence - February 2014 - 57
Computational Intelligence - February 2014 - 58
Computational Intelligence - February 2014 - 59
Computational Intelligence - February 2014 - 60
Computational Intelligence - February 2014 - 61
Computational Intelligence - February 2014 - 62
Computational Intelligence - February 2014 - 63
Computational Intelligence - February 2014 - 64
Computational Intelligence - February 2014 - Cover3
Computational Intelligence - February 2014 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202311
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202308
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202305
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202302
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202211
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202208
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202205
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202202
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202111
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202108
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202105
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202102
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202011
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202008
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202005
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202002
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201911
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201908
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201905
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201902
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201811
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201808
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201805
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201802
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter12
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall12
https://www.nxtbookmedia.com