IEEE Computational Intelligence Magazine - February 2020 - 80

TABLE I B4MSA parameters used per language.
TEXT TRANSFORMATION
PARAMETER

DEFAULT

ARABIC

ENGLISH

SPANISH

REMOVE DIACRITICS

YES

YES

NO

YES

REMOVE DUPLICATES

YES

REMOVE PUNCTUATION

YES

LOWERCASE

YES

EMOTICONS

GROUP

NUMBERS

GROUP

GROUP

DELETE

URLS

GROUP

USERS

GROUP

HASHTAG
ENTITIES

NONE
NONE

DELETE

NONE

FALSE

DELETE

FALSE

NEGATION
STOPWORDS

GROUP

NONE

FALSE

STEMMING

FALSE

FALSE
TOKENIZERS

N-WORDS

{1, 2}

{1}

{1, 2}

SKIP-GRAMS

{ }

{ }

{(3, 1)}

{(2, 1)}

Q-GRAMS

{2, 3, 4}

{2, 3, 4}

{3, 4}

{2, 3, 4, 5, 6}

procedure used to train EvoDAG. It
receives the first-stage text models,
m ! M, and TR. From lines 2-9, it iterates for the different text models, m, transforming the text into vectors (line 3),
these vectors are used in k-fold cross-validation (lines 5-8) to predict the decision
function values of the validation set (vs).
During the folding process, there are two
disjoint sets, tr and vs, where tr is used to
train an SVM (line 6), and vs is the set to
be predicted (line 7). The predictions
obtained for the different models, M, are
concatenated (line 9) to form EvoDAG's
training set. The last step is to train EvoDAG (line 11) with the predicted values.
The rest of this section describes the
different text models, m, used in this contribution. It starts with B4MSA using two
datasets, the lexicon-based models, Emoji
Space and FastText. The last subsection is
devoted to describing EvoDAG, the classifier used in EvoMSA's second stage.
A. B4MSA

The first two text models, i.e., m 1 and m 2,
use our baseline for multilingual sentiment
analysis, namely B4MSA6 [44]. B4MSA
6

https://github.com/INGEOTEC/b4msa

80

{1}

uses an equivalent structure that the models used in EvoMSA's first stage, i.e.,
g b % m b. Function m b uses a series of simple language-independent text transformations to convert text into tokens, as well as
some language-dependent transformation
commonly implemented on various opensource libraries. Nonetheless, it avoids the
usage of computational expensive linguistic
tasks such as part-of-speech tagging,
dependency parsing, among others. Then,
these tokens are represented into a vector
space model using TF-IDF, and, finally, the
vectors and their associated classes are
learned by a linear SVM (i.e., g b).
B4MSA was conceived to serve as a
baseline for text categorization. To achieve
this, it starts with a search in its parameter
space to find an acceptable configuration.
However, this search, per problem, increment the time required to find a model,
and besides, our previous work on sentiment analysis (see [45]) indicates that some
parameters could be fixed with a minimal
impact on the performance. Consequently,
it was decided to keep constant the
parameters of B4MSA per language.
Table I shows B4MSA's parameters
per language. These parameters were
obtained by measuring their performance

IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | FEBRUARY 2020

(using macro-F1) on all the datasets used
in this contribution, and, using k-fold
cross-validation (k = 5) on the training
set. The parameter space was sampled
using a loop of two steps. In the first step,
the parameters varied were the tokenizers; it was tested all the combinations of
n-words 1, 2, and 3; skip-grams (3, 1), (2,
2), and (2, 1); and q-grams 2, 3, 4, 5, and 6.
The second step tested the rest of the
parameters shown in the table; these
parameters are either dichotomic or
treated as such, this is the case of parameters with possible values like group or
delete. This process continues until a stable
configuration is found, that is, where the
best configuration is the one found in the
previous step.
Some of B4MSA's parameters are selfdescribed such as remove diacritics, duplicates, punctuation symbols, and convert
text to lowercase. The emoticons were
changed to the words _pos, _neg, or _neu
depending on the polarity expressed.
Numbers, URLs and users are either
deleted or replaced with words _num, _url,
and _usr, respectively. The tokens are
words, bigrams of words, q-grams of different sizes, and skip-grams. The notation
used in skip-gram is (a, b) where a indicates the number of words and b indices
the length of the skip, for example, in have
a nice weekend the skip-gram (2, 1) would
be have nice and a weekend.
B4MSA is used to create two models
(g % m 1 and g % m 2), one using the competition training set (TR) and the other
using a human annotated (HA) dataset.
Regarding TR, m 1 = m b, i.e., m 1 is
B4MSA's text model, and, as a result,
EvoMSA's first model is g % m b. On the
other hand, HA dataset is composed of
texts and their associated polarity (negative,
neutral, or positive), and, it is not related to
TR. Consequently, it is feasible to create a
text classifier that outputs the polarity of a
given text. That is m 2 = g b % m b where m b
is B4MSA's text model (using the parameters shown in Table I) and g b is a linear
SVM trained on HA, therefore EvoMSA's
second model is g % g b % m b.
B. Lexicon-Based Model

The text model, m 3, introduces external
knowledge into our approach by the use


https://github.com/INGEOTEC/b4msa

IEEE Computational Intelligence Magazine - February 2020

Table of Contents for the Digital Edition of IEEE Computational Intelligence Magazine - February 2020

Contents
IEEE Computational Intelligence Magazine - February 2020 - Cover1
IEEE Computational Intelligence Magazine - February 2020 - Cover2
IEEE Computational Intelligence Magazine - February 2020 - Contents
IEEE Computational Intelligence Magazine - February 2020 - 2
IEEE Computational Intelligence Magazine - February 2020 - 3
IEEE Computational Intelligence Magazine - February 2020 - 4
IEEE Computational Intelligence Magazine - February 2020 - 5
IEEE Computational Intelligence Magazine - February 2020 - 6
IEEE Computational Intelligence Magazine - February 2020 - 7
IEEE Computational Intelligence Magazine - February 2020 - 8
IEEE Computational Intelligence Magazine - February 2020 - 9
IEEE Computational Intelligence Magazine - February 2020 - 10
IEEE Computational Intelligence Magazine - February 2020 - 11
IEEE Computational Intelligence Magazine - February 2020 - 12
IEEE Computational Intelligence Magazine - February 2020 - 13
IEEE Computational Intelligence Magazine - February 2020 - 14
IEEE Computational Intelligence Magazine - February 2020 - 15
IEEE Computational Intelligence Magazine - February 2020 - 16
IEEE Computational Intelligence Magazine - February 2020 - 17
IEEE Computational Intelligence Magazine - February 2020 - 18
IEEE Computational Intelligence Magazine - February 2020 - 19
IEEE Computational Intelligence Magazine - February 2020 - 20
IEEE Computational Intelligence Magazine - February 2020 - 21
IEEE Computational Intelligence Magazine - February 2020 - 22
IEEE Computational Intelligence Magazine - February 2020 - 23
IEEE Computational Intelligence Magazine - February 2020 - 24
IEEE Computational Intelligence Magazine - February 2020 - 25
IEEE Computational Intelligence Magazine - February 2020 - 26
IEEE Computational Intelligence Magazine - February 2020 - 27
IEEE Computational Intelligence Magazine - February 2020 - 28
IEEE Computational Intelligence Magazine - February 2020 - 29
IEEE Computational Intelligence Magazine - February 2020 - 30
IEEE Computational Intelligence Magazine - February 2020 - 31
IEEE Computational Intelligence Magazine - February 2020 - 32
IEEE Computational Intelligence Magazine - February 2020 - 33
IEEE Computational Intelligence Magazine - February 2020 - 34
IEEE Computational Intelligence Magazine - February 2020 - 35
IEEE Computational Intelligence Magazine - February 2020 - 36
IEEE Computational Intelligence Magazine - February 2020 - 37
IEEE Computational Intelligence Magazine - February 2020 - 38
IEEE Computational Intelligence Magazine - February 2020 - 39
IEEE Computational Intelligence Magazine - February 2020 - 40
IEEE Computational Intelligence Magazine - February 2020 - 41
IEEE Computational Intelligence Magazine - February 2020 - 42
IEEE Computational Intelligence Magazine - February 2020 - 43
IEEE Computational Intelligence Magazine - February 2020 - 44
IEEE Computational Intelligence Magazine - February 2020 - 45
IEEE Computational Intelligence Magazine - February 2020 - 46
IEEE Computational Intelligence Magazine - February 2020 - 47
IEEE Computational Intelligence Magazine - February 2020 - 48
IEEE Computational Intelligence Magazine - February 2020 - 49
IEEE Computational Intelligence Magazine - February 2020 - 50
IEEE Computational Intelligence Magazine - February 2020 - 51
IEEE Computational Intelligence Magazine - February 2020 - 52
IEEE Computational Intelligence Magazine - February 2020 - 53
IEEE Computational Intelligence Magazine - February 2020 - 54
IEEE Computational Intelligence Magazine - February 2020 - 55
IEEE Computational Intelligence Magazine - February 2020 - 56
IEEE Computational Intelligence Magazine - February 2020 - 57
IEEE Computational Intelligence Magazine - February 2020 - 58
IEEE Computational Intelligence Magazine - February 2020 - 59
IEEE Computational Intelligence Magazine - February 2020 - 60
IEEE Computational Intelligence Magazine - February 2020 - 61
IEEE Computational Intelligence Magazine - February 2020 - 62
IEEE Computational Intelligence Magazine - February 2020 - 63
IEEE Computational Intelligence Magazine - February 2020 - 64
IEEE Computational Intelligence Magazine - February 2020 - 65
IEEE Computational Intelligence Magazine - February 2020 - 66
IEEE Computational Intelligence Magazine - February 2020 - 67
IEEE Computational Intelligence Magazine - February 2020 - 68
IEEE Computational Intelligence Magazine - February 2020 - 69
IEEE Computational Intelligence Magazine - February 2020 - 70
IEEE Computational Intelligence Magazine - February 2020 - 71
IEEE Computational Intelligence Magazine - February 2020 - 72
IEEE Computational Intelligence Magazine - February 2020 - 73
IEEE Computational Intelligence Magazine - February 2020 - 74
IEEE Computational Intelligence Magazine - February 2020 - 75
IEEE Computational Intelligence Magazine - February 2020 - 76
IEEE Computational Intelligence Magazine - February 2020 - 77
IEEE Computational Intelligence Magazine - February 2020 - 78
IEEE Computational Intelligence Magazine - February 2020 - 79
IEEE Computational Intelligence Magazine - February 2020 - 80
IEEE Computational Intelligence Magazine - February 2020 - 81
IEEE Computational Intelligence Magazine - February 2020 - 82
IEEE Computational Intelligence Magazine - February 2020 - 83
IEEE Computational Intelligence Magazine - February 2020 - 84
IEEE Computational Intelligence Magazine - February 2020 - 85
IEEE Computational Intelligence Magazine - February 2020 - 86
IEEE Computational Intelligence Magazine - February 2020 - 87
IEEE Computational Intelligence Magazine - February 2020 - 88
IEEE Computational Intelligence Magazine - February 2020 - 89
IEEE Computational Intelligence Magazine - February 2020 - 90
IEEE Computational Intelligence Magazine - February 2020 - 91
IEEE Computational Intelligence Magazine - February 2020 - 92
IEEE Computational Intelligence Magazine - February 2020 - 93
IEEE Computational Intelligence Magazine - February 2020 - 94
IEEE Computational Intelligence Magazine - February 2020 - 95
IEEE Computational Intelligence Magazine - February 2020 - 96
IEEE Computational Intelligence Magazine - February 2020 - 97
IEEE Computational Intelligence Magazine - February 2020 - 98
IEEE Computational Intelligence Magazine - February 2020 - 99
IEEE Computational Intelligence Magazine - February 2020 - 100
IEEE Computational Intelligence Magazine - February 2020 - Cover3
IEEE Computational Intelligence Magazine - February 2020 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202311
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202308
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202305
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202302
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202211
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202208
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202205
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202202
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202111
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202108
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202105
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202102
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202011
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202008
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202005
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202002
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201911
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201908
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201905
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201902
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201811
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201808
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201805
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201802
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter12
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall12
https://www.nxtbookmedia.com