Computational Intelligence - August 2015 - 53

Online Resource for ENN:
http://www.ele.uri.edu/faculty/he/
ENN/ENN.html
Supplementary materials and
source code implementation of the
ENN classifier have been provided.
The source code includes all three
versions of the ENN classifier, ENN,
ENN.V1 and ENN.V2, in Matlab
M-file format.

chemical data analysis [7], disease classification and clinical outcome prediction
[8], among others. In addition to pattern
recognition, KNN based methods are also
widely used for clustering [9], regression
[10], density estimation [11], and outlier
detection [12], to name a few. In fact,
KNN has been identified as one of the
"top 10 algorithms in data mining" by the
IEEE International Conference on Data
Mining (ICDM) presented in Hong
Kong in 2006 [13].
There are several important issues in
the KNN method that have been of
great interest since it was originally proposed. The first one is how to choose
the user-defined parameter k , the number of nearest neighbors. To determine
the best k , a straightforward method is
through cross validation by trying various k and choose the one with the best
performance [14]. The second one is
how to determine an appropriate dissimilarity measurement. The Euclidean
distance is commonly used for this purpose, but it is easily influenced by irrelevant features therefore suffering from
noisy samples, especially with highdimensional data (i.e., the curse of
dimensionality problem). Feature normalization and dimensionality reduction
methods are able to remedy this issue in
certain ways. More recently, distance
metric learning from training data is of
great interest to improve the prediction
performance of KNN based methods
[15]-[21]. For example, Goldberger et al.
proposed a stochastic nearest neighbor
approach for distance metric learning
with the optimum leave-one-out performance on training data [15]. Weinberger et al. proposed to learn a Mahala-

nobis distance metric by semidefinite
programming [16] [17]. Hastie and Tibshirani used a local linear discriminant
analysis to estimate distance metrics for
computing neighbors and building local
decision boundaries for classification
[21]. The third issue in the KNN
method is its high computational complexity for large data sets. To address this
issue, Gowda et al. proposed a condensed nearest neighbor approach using
a concept of mutual nearest neighbors
to reduce the size of training data for
decision making [22]. Bagui et al.
employed a concept of ranking to
reduce the computational complexity
[23]. To speed up the searching of nearest neighbors and overcome the memory limitation, many data structures,
named fast KNN, have been proposed
to implement the KNN method, such as
k-d tree [24], nearest feature line [25],
orthogonal search tree [26], ball-tree
[27], and principal axis search tree [28],
to name a few.
In general, two types of errors may
occur in KNN based methods: for the
samples in the areas of higher density,
the k nearest neighbors could lie in the
areas of slightly lower density, or vice
versa [29]. These types of errors may
result in misclassification, especially
when nearest neighbors are dominated
by the samples from other classes. Error
rates are increased when sample sizes
are small or data are imbalanced (see a
comprehensive survey on imbalanced
learning [30]). In this paper, we introduce extended nearest neighbor
(ENN), a new method based on generalized class-wise statistics which can
represent intra-class coherence. Unlike
the classic KNN method where only
the nearest neighbors are considered
for classification, our proposed ENN
method takes advantage of all available
training data to make a classification
decision: it assigns a class membership
to an unknown sample to maximize
the intra-class coherence. By exploiting
the information from all available data
to maximize the intra-class coherence,
ENN is able to learn from the global
distribution, therefore, improving pattern recognition performance.

II. Problem Formulation:
Limitations of KNN

The reason of the "two types of errors"
[29] is that KNN method is sensitive to
the scale or variance of the distributions
of the predefined classes [31]. The nearest neighbors of an unknown observation will tend to be dominated by the
class with the highest density. For example, in a two-class classification problem,
if we assume that class 1 has a smaller
variance than class 2, then the class 1
samples dominate their near neighborhood with higher density (i.e., more
concentrated distribution), whereas the
class 2 samples are distributed in regions
with lower density (i.e., more spread out
distribution). In this case, for class 2 samples, the number of nearest neighbors
from class 1 may exceed that from class
2. For instance, for those class 2 samples
which are close to the region of class 1,
there may be a large number of class 1
neighbors because of the higher density
of class 1. Only for those class 2 samples
which are far away from the region of
higher density of class 1, their nearest
neighbors from class 2 will be dominant.
Fig. 1 shows one example (a two-class
classification scenario) of decision making
in the classic KNN method for two
Gaussian distributions with different
means and variances, in which we assume
that their prior distribution p (~) are the
same. In this figure, the x-axis represents
the data value (one dimensional data in
this case), and the y-axis represents the
class-conditional probability p (x|~) . For
the eight data points illustrated here, we
assume that data points x 1, x 2, x 3 and x 4
are sampled from the class 1 distribution,
and data points x 5, x 6, x 7 and x 8 are
sampled from the class 2 distribution. On
the side of this figure, we also listed their
corresponding k 1 /k and k 2 /k ratio (here
k = 5 is the parameter of KNN, and k 1
and k 2 are the number of nearest neighbors belong to class 1 and class 2, respectively). In this case, the Bayesian method
can correctly classify all these eight data
points by calculating their corresponding
posterior probabilities according to the
Bayes' theorem. However, under the
KNN method, all of these eight data
points are predicted as class 1. This means

August 2015 | IEEE ComputAtIonAl IntEllIgEnCE mAgAzInE

53
http://www.ele.uri.edu/faculty/he/
Table of Contents for the Digital Edition of Computational Intelligence - August 2015