IEEE Signal Processing - May 2018 - 105
Phonological analysis starts with a short-term analysis of
speech, which consists of converting the speech signal into a
sequence of acoustic feature vectors X = {vx 1, f, vx n, f, vx N }.
Each vx n is also known as an acoustic frame or just frame, and
can be composed by the conventional Mel-frequency cepstral
coefficients (MFCCs). The Mel scale is a perceptual scale
often used in speech signal processing. N is the number of
frames, and the frames are equally spaced in time.
Then, K phonological probabilities z kn are estimated for
each frame. Each probability is computed independently
using a binary classifier based on deep NNs (DNNs) and
trained with one phonological class versus the rest. Finally, the
acoustic feature observation sequence X is transformed into a
sequence of phonological vectors Z = {vz 1, f, vz n, f, vz N }.
Each vector vz n = [z 1n, f, z kn, f, z Kn ] < consists of phonological
class posterior probabilities z kn = p (c k | x n) of K phonological
features (classes) c k . The a posteriori estimates p (c k | x n) are
K
0 # p (c k | x n) # 1, 6k, and max / k = 1 p (c k | x n) = K. The z kn
features can be further quantized or compressed using sparse
coding that relies on the structured sparsity of the phonological features.
Asaei et al. [53] demonstrated that structured sparse coding
of the binary features enables the codec to operate at 700 bits/s
without imposing any latency or quality loss with respect to the
earlier developed vocoder [55]. By considering a latency of
about 256 ms, the bit rate of about 300 bits/s is achieved without the requirement of any prior knowledge on suprasegmental
(e.g., syllabic) identities.
Compressive sampling (CS) relies on sparse representation
to re-construct high-dimensional data using very few linear
nonadaptive observations. A data representation of a ! R N is
K-sparse if only K % N entries of a have nonzero values. The set
of indices corresponding to the nonzero entries is called the support of a. The CS theory asserts that only M = O (K log (N/K))
linear measurements, z ! R M obtained as
z =Da
(CS coder),
(2)
suffice to reconstruct a, where D ! R M # N is a compressive
measurement matrix that preserves the pairwise distances of
the sparse features a in the compressed code z.
At the coding step, the choice of compressive measurement
matrix D is very important. A sufficient, but not necessary
condition on D to guarantee decoding of the sparse representation coefficients is that all pairwise distances between K-sparse
representations must be well preserved in the observation space,
or equivalently, all subsets of K columns taken from the measurement matrix are nearly orthogonal. This condition on the
compressive measurement matrix is referred to as the restricted
isometry property (RIP). The random matrices generated by
sampling from Gaussian or Bernoulli distributions are proved to
satisfy the RIP condition [87]. The choice of a Bernoulli matrix
is demonstrated to achieve higher robustness to quantization [53].
Given an observation vector z, and the measurement matrix
D, the sparse representation a is obtained by the optimization
problem stated as
min
< a < 0 subject to z = Da (sparse decoder),
a
(3)
where the counting function < . < 0: R M " N returns the number
of nonzero components in its argument. The nonconvex objective < a < 0 is often relaxed to < a < 1 = / i a i , which can be
solved in polynomial time [88]. Recent advances in CS exploits
interdependency structure underlying the support of the sparse
coefficients in recovery algorithms to reduce the number of
required observations and to better differentiate the true coefficients from recovered artifacts for higher quality [89].
Long-term linguistic coding
Long-term analysis includes the analysis of speech information encoded at stress d (1-3 Hz) and syllabic i (4-8 Hz)
timescales. An important module that facilitates extraction of
this information is a robust syllable boundary detector.
The neuromorphic syllable detector [51] has some desirable properties that are very suitable for speech processing
systems. Similarly, as humans encode speech incrementally,
(i.e., not considering future temporal context), the proposed
method works incrementally as well. In addition, it is highly
robust noise. Syllabification performance at different noise
conditions was compared to the existing Mermelstein and
group-delay algorithms. While the performance of the existing
methods depends on the type of noise and signal-to-noise
ratio, the performance of the proposed method is constant
under all noise conditions. Figure 7 shows the neural oscillation that automatically locks to slow speech fluctuations that
convey the syllabic rhythm.
Composition of short- and long-term NNs
An NN-based speech coder can be realized as a composition
of deep and spiking NNs [90]. The deep NNs encode and
decode the speech signal based on the binary phonological
speech representation, and the spiking net based on neuromorphic syllabification, previously described, is used for prosody
encoding. Prosody represents the patterns of stress and intonation of the speech signal. Figure 8 shows the training and
inference stages of the three different NNs.
(a)
200 ms
(b)
Figure 7. A neuromorphic model output for one examplary sentence,
"Alfafa is healthy for you." (a) Dark green lines show excitatory neurons
spikes and (b) light green lines represent inhibitory neuron spikes. Vertical
lines on top of the spectrogram indicate actual syllable boundaries [42].
IEEE Signal Processing Magazine
|
May 2018
|
105
Table of Contents for the Digital Edition of IEEE Signal Processing - May 2018
Contents
IEEE Signal Processing - May 2018 - Cover1
IEEE Signal Processing - May 2018 - Cover2
IEEE Signal Processing - May 2018 - Contents
IEEE Signal Processing - May 2018 - 2
IEEE Signal Processing - May 2018 - 3
IEEE Signal Processing - May 2018 - 4
IEEE Signal Processing - May 2018 - 5
IEEE Signal Processing - May 2018 - 6
IEEE Signal Processing - May 2018 - 7
IEEE Signal Processing - May 2018 - 8
IEEE Signal Processing - May 2018 - 9
IEEE Signal Processing - May 2018 - 10
IEEE Signal Processing - May 2018 - 11
IEEE Signal Processing - May 2018 - 12
IEEE Signal Processing - May 2018 - 13
IEEE Signal Processing - May 2018 - 14
IEEE Signal Processing - May 2018 - 15
IEEE Signal Processing - May 2018 - 16
IEEE Signal Processing - May 2018 - 17
IEEE Signal Processing - May 2018 - 18
IEEE Signal Processing - May 2018 - 19
IEEE Signal Processing - May 2018 - 20
IEEE Signal Processing - May 2018 - 21
IEEE Signal Processing - May 2018 - 22
IEEE Signal Processing - May 2018 - 23
IEEE Signal Processing - May 2018 - 24
IEEE Signal Processing - May 2018 - 25
IEEE Signal Processing - May 2018 - 26
IEEE Signal Processing - May 2018 - 27
IEEE Signal Processing - May 2018 - 28
IEEE Signal Processing - May 2018 - 29
IEEE Signal Processing - May 2018 - 30
IEEE Signal Processing - May 2018 - 31
IEEE Signal Processing - May 2018 - 32
IEEE Signal Processing - May 2018 - 33
IEEE Signal Processing - May 2018 - 34
IEEE Signal Processing - May 2018 - 35
IEEE Signal Processing - May 2018 - 36
IEEE Signal Processing - May 2018 - 37
IEEE Signal Processing - May 2018 - 38
IEEE Signal Processing - May 2018 - 39
IEEE Signal Processing - May 2018 - 40
IEEE Signal Processing - May 2018 - 41
IEEE Signal Processing - May 2018 - 42
IEEE Signal Processing - May 2018 - 43
IEEE Signal Processing - May 2018 - 44
IEEE Signal Processing - May 2018 - 45
IEEE Signal Processing - May 2018 - 46
IEEE Signal Processing - May 2018 - 47
IEEE Signal Processing - May 2018 - 48
IEEE Signal Processing - May 2018 - 49
IEEE Signal Processing - May 2018 - 50
IEEE Signal Processing - May 2018 - 51
IEEE Signal Processing - May 2018 - 52
IEEE Signal Processing - May 2018 - 53
IEEE Signal Processing - May 2018 - 54
IEEE Signal Processing - May 2018 - 55
IEEE Signal Processing - May 2018 - 56
IEEE Signal Processing - May 2018 - 57
IEEE Signal Processing - May 2018 - 58
IEEE Signal Processing - May 2018 - 59
IEEE Signal Processing - May 2018 - 60
IEEE Signal Processing - May 2018 - 61
IEEE Signal Processing - May 2018 - 62
IEEE Signal Processing - May 2018 - 63
IEEE Signal Processing - May 2018 - 64
IEEE Signal Processing - May 2018 - 65
IEEE Signal Processing - May 2018 - 66
IEEE Signal Processing - May 2018 - 67
IEEE Signal Processing - May 2018 - 68
IEEE Signal Processing - May 2018 - 69
IEEE Signal Processing - May 2018 - 70
IEEE Signal Processing - May 2018 - 71
IEEE Signal Processing - May 2018 - 72
IEEE Signal Processing - May 2018 - 73
IEEE Signal Processing - May 2018 - 74
IEEE Signal Processing - May 2018 - 75
IEEE Signal Processing - May 2018 - 76
IEEE Signal Processing - May 2018 - 77
IEEE Signal Processing - May 2018 - 78
IEEE Signal Processing - May 2018 - 79
IEEE Signal Processing - May 2018 - 80
IEEE Signal Processing - May 2018 - 81
IEEE Signal Processing - May 2018 - 82
IEEE Signal Processing - May 2018 - 83
IEEE Signal Processing - May 2018 - 84
IEEE Signal Processing - May 2018 - 85
IEEE Signal Processing - May 2018 - 86
IEEE Signal Processing - May 2018 - 87
IEEE Signal Processing - May 2018 - 88
IEEE Signal Processing - May 2018 - 89
IEEE Signal Processing - May 2018 - 90
IEEE Signal Processing - May 2018 - 91
IEEE Signal Processing - May 2018 - 92
IEEE Signal Processing - May 2018 - 93
IEEE Signal Processing - May 2018 - 94
IEEE Signal Processing - May 2018 - 95
IEEE Signal Processing - May 2018 - 96
IEEE Signal Processing - May 2018 - 97
IEEE Signal Processing - May 2018 - 98
IEEE Signal Processing - May 2018 - 99
IEEE Signal Processing - May 2018 - 100
IEEE Signal Processing - May 2018 - 101
IEEE Signal Processing - May 2018 - 102
IEEE Signal Processing - May 2018 - 103
IEEE Signal Processing - May 2018 - 104
IEEE Signal Processing - May 2018 - 105
IEEE Signal Processing - May 2018 - 106
IEEE Signal Processing - May 2018 - 107
IEEE Signal Processing - May 2018 - 108
IEEE Signal Processing - May 2018 - 109
IEEE Signal Processing - May 2018 - 110
IEEE Signal Processing - May 2018 - 111
IEEE Signal Processing - May 2018 - 112
IEEE Signal Processing - May 2018 - 113
IEEE Signal Processing - May 2018 - 114
IEEE Signal Processing - May 2018 - 115
IEEE Signal Processing - May 2018 - 116
IEEE Signal Processing - May 2018 - 117
IEEE Signal Processing - May 2018 - 118
IEEE Signal Processing - May 2018 - 119
IEEE Signal Processing - May 2018 - 120
IEEE Signal Processing - May 2018 - 121
IEEE Signal Processing - May 2018 - 122
IEEE Signal Processing - May 2018 - 123
IEEE Signal Processing - May 2018 - 124
IEEE Signal Processing - May 2018 - 125
IEEE Signal Processing - May 2018 - 126
IEEE Signal Processing - May 2018 - 127
IEEE Signal Processing - May 2018 - 128
IEEE Signal Processing - May 2018 - 129
IEEE Signal Processing - May 2018 - 130
IEEE Signal Processing - May 2018 - 131
IEEE Signal Processing - May 2018 - 132
IEEE Signal Processing - May 2018 - Cover3
IEEE Signal Processing - May 2018 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com