Computational Intelligence - November 2012 - 23

Abstract-Biology is in the middle of a data explosion. The technical advances
achieved by the genomics, metabolomics, transcriptomics and proteomics technologies in recent years have significantly increased the amount of data that are available
for biologists to analyze different aspects of an organism. However, *omics data sets
have several additional problems: they have inherent biological complexity and may
have significant amounts of noise as well as measurement artifacts. The need to
extract information from such databases has once again become a challenge. This
requires novel computational techniques and models to automatically perform data
mining tasks such as integration of different data types, clustering and knowledge
discovery, among others. In this article, we will present a novel integrated computational intelligence approach for biological data mining that involves neural networks
and evolutionary computation. We propose the use of self-organizing maps for the
identification of coordinated patterns variations; a new training algorithm that can
include a priori biological information to obtain more biological meaningful clusters; a validation measure that can assess the biological significance of the clusters
found; and finally, an evolutionary algorithm for the inference of unknown metabolic pathways involving the selected clusters.

M

1. Introduction

odern biology studies generate a large amount of
data, that require dedicated computational tools
for their analysis. Data integration is also gaining
importance given the need for extracting knowledge from multiple data types and sources, with the aim of
inferring insights from the genetic processes underlying them
[1], [2], [3]. In fact, since the completion of genome sequences, functional identification of unknown genes has become a
principal challenge in systems biology. Bioinformatics plays
an important role here, allowing biologists to make full use of
the advances in computer science in analyzing large and
complex datasets.
At the beginning of the genomics revolution, bioinformatics referred only to the creation and management of large databases to store biological data. However, the discipline has
evolved over time, mainly from the application and adaptation of classical statistical methods and standard clustering
algorithms, such as hierarchical clustering (HC) and
k-means (KM) [4], [5], [6], towards more recent
approaches based on computational intelligence [7], [8],
[9], with promising results. Yet their application to bioinformatics problems has gained popularity only recently [10].
From an application point of view, a current trend is to
achieve integration of different types of biological data to
reveal hidden correlations between them, allowing the inference of new knowledge regarding the biological processes
that affect them. However, the discovery of hidden patterns in
such data is currently a challenge because the use of any type
of algorithm for pattern recognition is hampered by a limited
number of samples and a very high number of dimensions.
Besides, biological data sets may have significant amounts of
noise as well as measurement artifacts. This highlights the
need to develop new techniques aimed at overcoming the
limitations of existing ones. New computational models to
perform several data mining tasks, such as integration of

different data types, unsupervised clustering and knowledge
discovery, are required.
In this article we will present a novel integrated computational intelligence approach for biological data mining
(Figure 1). It involves the use and application of two of the
most important and well-tested techniques in the computational intelligence field: neural networks and evolutionary algorithms. The different models and techniques involved in the
proposed approach could be used separately, since they tackle
different data mining aspects that can be treated as separated
problems: data pre-processing and integration, clustering, clusters validation and selection, and pathway search. We will show
the integration among them for the purpose of data mining
and knowledge discovery in biological data. We will present
and explain each step of the proposed approach in detail, using
as a case study for its application a real biological data set of
Arabidopsis thaliana, which is the model species of current plant
genomics research.
The first step involves the obtention and selection of the
biological data, the kind and number of data types and sources,
such as microarray experiments,
metabolic profiles and pathways
information, among others; the
number of experiments and
repetitions for each dataset, as well as the structure and type of
data files that
contain them. It also
requires cleaning and artifacts elimination from data, as well as
the application of appropriate selection criteria with the objective of including only sufficiently expressed data [11]. This step
also needs a treatment of the expression intensity values over
the control sample in the case of data coming from several
experimental sources [12] [Figure 1(a)]. The next stage requires

NOVEMBER 2012 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE

23



Table of Contents for the Digital Edition of Computational Intelligence - November 2012

Computational Intelligence - November 2012 - Cover1
Computational Intelligence - November 2012 - Cover2
Computational Intelligence - November 2012 - 1
Computational Intelligence - November 2012 - 2
Computational Intelligence - November 2012 - 3
Computational Intelligence - November 2012 - 4
Computational Intelligence - November 2012 - 5
Computational Intelligence - November 2012 - 6
Computational Intelligence - November 2012 - 7
Computational Intelligence - November 2012 - 8
Computational Intelligence - November 2012 - 9
Computational Intelligence - November 2012 - 10
Computational Intelligence - November 2012 - 11
Computational Intelligence - November 2012 - 12
Computational Intelligence - November 2012 - 13
Computational Intelligence - November 2012 - 14
Computational Intelligence - November 2012 - 15
Computational Intelligence - November 2012 - 16
Computational Intelligence - November 2012 - 17
Computational Intelligence - November 2012 - 18
Computational Intelligence - November 2012 - 19
Computational Intelligence - November 2012 - 20
Computational Intelligence - November 2012 - 21
Computational Intelligence - November 2012 - 22
Computational Intelligence - November 2012 - 23
Computational Intelligence - November 2012 - 24
Computational Intelligence - November 2012 - 25
Computational Intelligence - November 2012 - 26
Computational Intelligence - November 2012 - 27
Computational Intelligence - November 2012 - 28
Computational Intelligence - November 2012 - 29
Computational Intelligence - November 2012 - 30
Computational Intelligence - November 2012 - 31
Computational Intelligence - November 2012 - 32
Computational Intelligence - November 2012 - 33
Computational Intelligence - November 2012 - 34
Computational Intelligence - November 2012 - 35
Computational Intelligence - November 2012 - 36
Computational Intelligence - November 2012 - 37
Computational Intelligence - November 2012 - 38
Computational Intelligence - November 2012 - 39
Computational Intelligence - November 2012 - 40
Computational Intelligence - November 2012 - 41
Computational Intelligence - November 2012 - 42
Computational Intelligence - November 2012 - 43
Computational Intelligence - November 2012 - 44
Computational Intelligence - November 2012 - 45
Computational Intelligence - November 2012 - 46
Computational Intelligence - November 2012 - 47
Computational Intelligence - November 2012 - 48
Computational Intelligence - November 2012 - 49
Computational Intelligence - November 2012 - 50
Computational Intelligence - November 2012 - 51
Computational Intelligence - November 2012 - 52
Computational Intelligence - November 2012 - 53
Computational Intelligence - November 2012 - 54
Computational Intelligence - November 2012 - 55
Computational Intelligence - November 2012 - 56
Computational Intelligence - November 2012 - 57
Computational Intelligence - November 2012 - 58
Computational Intelligence - November 2012 - 59
Computational Intelligence - November 2012 - 60
Computational Intelligence - November 2012 - 61
Computational Intelligence - November 2012 - 62
Computational Intelligence - November 2012 - 63
Computational Intelligence - November 2012 - 64
Computational Intelligence - November 2012 - 65
Computational Intelligence - November 2012 - 66
Computational Intelligence - November 2012 - 67
Computational Intelligence - November 2012 - 68
Computational Intelligence - November 2012 - 69
Computational Intelligence - November 2012 - 70
Computational Intelligence - November 2012 - 71
Computational Intelligence - November 2012 - 72
Computational Intelligence - November 2012 - 73
Computational Intelligence - November 2012 - 74
Computational Intelligence - November 2012 - 75
Computational Intelligence - November 2012 - 76
Computational Intelligence - November 2012 - 77
Computational Intelligence - November 2012 - 78
Computational Intelligence - November 2012 - 79
Computational Intelligence - November 2012 - 80
Computational Intelligence - November 2012 - Cover3
Computational Intelligence - November 2012 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202311
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202308
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202305
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202302
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202211
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202208
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202205
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202202
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202111
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202108
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202105
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202102
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202011
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202008
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202005
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202002
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201911
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201908
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201905
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201902
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201811
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201808
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201805
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201802
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter12
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall12
https://www.nxtbookmedia.com