Computational Intelligence - August 2015 - 31

Abstract-Random Projection (RP) is a popular technique for dimensionality reduction
because of its high computational efficiency. However, RP may not yield highly discriminative low-dimensional space to produce best pattern classification performance since the random transformation matrix of RP is independent of data. In this paper, we propose a
Semi-Random Projection (SRP) framework, which takes the merit of random feature sampling of RP, but employs learning mechanism in the determination of the transformation
matrix. One advantage of SRP is that it achieves a good balance between computational
complexity and classification accuracy. Another advantage of SRP is that multiple SRP
modules can be stacked to form a deep learning architecture for compact and robust feature
learning. In addition, based on the insight on the relationship between RP and Extreme
Learning Machine (ELM), the SRP is applied to ELM to derive Partially Connected ELM
(PC-ELM). The hidden nodes of PC-ELM are more discriminative and hence a smaller
number of nodes are needed. Experiments on two real-world text corpus, i.e., 20 Newsgroups and Farms Ads., verify the effectiveness and efficiency of the proposed SRP. Experimental results also show that PC-ELM outperforms ELM for high-dimensional data.

I. Introduction

achine learning and data mining techniques have
been applied in many areas [1]. In some of these
reported application scenarios, the data has very
high dimensionality. For example, there are usually
up to several thousands of merchandises in a hypermarket,
each of which can be considered as a dimension of the purchase record data. Clearly, the purchase records are very highdimensional data. In text mining, a document is often
represented by a vector whose dimensionality is equal to the
vocabulary size. Besides the very high dimensionality problem, the data are often sparse, since one customer only buys a
few goods every time, and word usage in one document just
covers a small portion of the entire vocabulary. The high
dimensionality and sparsity make the distance metrics
become less meaningful, and this in turn deteriorates the performance of many distance-based machine learning and data
mining approaches. High dimensionality also poses high
computational overhead to machine learning and data mining
algorithms. Dimensionality reduction is an effective technique
to tackle these problems.
Dimensionality reduction maps data from a high-dimensional space to a space of lower dimension under the
assumption that the intrinsic structure of the high-dimensional data can be retained in the low-dimensional space.
Among the various dimensionality reduction techniques, two
commonly used methods are Principal Component Analysis
(PCA) and Linear Discriminant Analysis (LDA) [2], [3], [4].

Digital Object Identifier 10.1109/MCI.2015.2437316
Date of publication: 16 July 2015

PCA seeks dimensions with maximum variance in an unsupervised way, while LDA tries to find dimensions with maximum discriminative power in a supervised way. In a very
high-dimensional space, however, direct use of PCA and
LDA might be problematic since the computational burden
of these algorithms increases dramatically with the growth of
data dimensions. To address this computational issue, Random Projection (RP), which maps data to a randomly generated low-dimensional latent space, was proposed in [5], [6].
Compared to other linear dimensionality reduction methods
such as PCA and LDA, RP is much less expensive in computational cost. However, the latent space of RP is randomly
generated without considering the structure of original data.
Therefore, RP may not capture the discriminative information underlying the original data. It is generally acknowledged that a low-dimensional and highly discriminative
space is preferred in machine learning tasks including visualization, clustering, classification, and so on.
In this paper, we propose a new dimensionality reduction
framework, named Semi-Random Projection (SRP), with the
purpose of finding a latent space with large discriminative
power while having a feasible computational load. Assuming
the original data dimension and the reduced dimension are d
and r respectively, we first select d s features in a random way.
Here, d s % d. We then project the data in the subspace spanned
by the d s features onto a single dimension using a transformation vector. The above process is repeated r times to obtain an
r-dimensional latent space. Different from RP whose transformation vectors are generated randomly, transformation vectors
in our method are learned from data. The transformation vector in each iteration is computed in a d s-dimensional subspace,
and hence demands much less computations than in the

august 2015 | IEEE ComputatIonal IntEllIgEnCE magazInE

Table of Contents for the Digital Edition of Computational Intelligence - August 2015