Signal Processing - July 2017 - 115

Table 1. Selected data-exploitation studies on the limited labeled speech resource. (continued )
Publications

Types

Approaches

Models

Applications

Databases and Languages

Hamanaka et al. 2010 [82]

AL

Query by committee

GMM-HMM

ASR

Corpus of Spontaneous Japanese (Ja)

Zhang and Schuller 2012 [83]

AL

Meta query

SVM

ER

FAU AEC (Ge)

Zhang et al. 2015 [84]

AL

Meta query

SVM

ER

FAU AEC (Ge)

Riccardi and Hakkani-Tür
2003 [85]

CL

Confidence score

HMM

ASR

"How May I Help You?" database (En)

Yu et al. 2010 [86]

CL

Confidence score

HMM

ASR

Broadcast Conv. and News corpora (Ma)

Zhang et al. 2015 [17]

CL

Confidence score

SVM

ER

FAU AEC (Ge), SUSAS (En)

Yu et al. 2010 [87]

CL

Global-entropy based

HMM

ASR

Directory assistance data (En)

BN: Bayes network; CTC: connectionist temporal classification; NB: naive Bayes; LR: logistic regression; RDT: randomized decision making; DAU: data augmentation; SS: speech
synthesis; ER: emotion recognition; As/Da/En/Fr/Ge/Ha/Ja/Ma/Sp/Tu/Vi/Xi/Zu: Assamese/Danish/English/French/German/Haitian Creole/Japanese/Mandarin/Spanish/
Turkish/Vietnamese/Xitsonga/Zulu.

where a, the wrapping factor, is randomly chosen from
[0.9, 1.1] . The results presented in [40] indicate that, in
terms of the phone error rate, deep networks trained on a
VTLP-augmented version of a small database can outperform the deep networks trained on the original data set.
Based on that work, a deterministic perturbation (i.e., a
changes in the range of warping factors with a fixed
step) rather than a random perturbation was proposed and
investigated [42].
SFM, inspired by voice conversion paradigms, seeks to
utilize the acoustic-feature-space relationship among speakers when augmenting a data set [41]. Specifically, it augments
training utterances by statistically converting one speaker's
speech data to another's using
x l = x · M,

(6)

where M is a transformation matrix of the feature spaces
between two speakers. The experimental results given in [41]
show that SFM offers improved performance over VTLP on
both ASR and keyword spotting (KWS) tasks.
Other data augmentation approaches include tempo-based,
speed-based, and volume-based perturbations [43]. Tempobased perturbation modifies the speech tempo while retaining
the pitch and the spectral envelope. Speed-based perturbation
varies the speech speed by resampling, whereas volume-based
perturbation changes the amplitude of signals.
While data augmentation approaches have frequently been
effective in ASR tasks [7], [44], this has not proved to be as
much the case in other ASA tasks, particularly in computational
paralinguistics [45]. A potential reason for this might be that the
detection of speaker states and traits (e.g., emotion, age, and gender) is more sensitive to changes in speech variation. Therefore,
training on inappropriately transformed speech would lead to a
worse model. Emotion, for example, is known to be related to the
speech tempo; speech with faster tempo is inclined to be recognized as higher arousal in emotion recognition, so changing the
associated speech tempo from fast to slow would potentially lead
to badly labeled training data.

Continued research efforts being undertaken to distinguish features that are task specific or task invariant could
help facilitate the application of data augmentation to other
speech analysis tasks. In addition, most recent applications of
data augmentation are performed for deep learning [7]. The
effectiveness of these techniques on shallow discriminative or
generative models is yet to be established.

Speech synthesis
Similar to data augmentation, the speech synthesis approach
aims to synthesize additional labeled data, i.e., L syn = SS (L),
such that the new labeled data set Ll is updated by
Ll = L , L syn . Theoretically, speech synthesis can produce an
infinite amount of labeled data via altering speech content or
modifying the parameters of a speech synthesizer. However, as
the parameters of the synthesizers have a limited range, the
simulated speech data often face the problem of limited variations. This can consequently result in the overfitting issue when
training models. Combining the synthesized speech data with
natural instances has been shown to help minimize this overfitting issue [46]. For emotion recognition in speech, it has been
shown that systems trained on synthesized speech (the test data
was natural speech) can deliver competitive performance when
compared to equivalent systems trained on natural speech [46].
In this article, two synthesizers rendering emotional speech-
Emofilt and Mbrola-were utilized to artificially generate
speech colored with predefined emotions [46].
Rather than directly synthesizing waveforms, an alternative
is generating parameterized speech that can be used directly
for training a discriminative classifier. Gales et al. [47] used
a hidden Markov model (HMM)-based statistical synthesis to
generate missing words in a training set, when building wordbased support vector machines (SVMs) for ASR. The results
presented indicate that this HMM-based synthesis approach
was able to yield gains over the baseline. Inspired by the success of deep learning, an emerging research trend is to use NNs
rather than HMMs to generate speech samples [48], which may
also mature in terms of the variation of synthesized speaker
states and traits.

IEEE SIGNAL PROCESSING MAGAZINE

|

July 2017

|

115



Table of Contents for the Digital Edition of Signal Processing - July 2017

Signal Processing - July 2017 - Cover1
Signal Processing - July 2017 - Cover2
Signal Processing - July 2017 - 1
Signal Processing - July 2017 - 2
Signal Processing - July 2017 - 3
Signal Processing - July 2017 - 4
Signal Processing - July 2017 - 5
Signal Processing - July 2017 - 6
Signal Processing - July 2017 - 7
Signal Processing - July 2017 - 8
Signal Processing - July 2017 - 9
Signal Processing - July 2017 - 10
Signal Processing - July 2017 - 11
Signal Processing - July 2017 - 12
Signal Processing - July 2017 - 13
Signal Processing - July 2017 - 14
Signal Processing - July 2017 - 15
Signal Processing - July 2017 - 16
Signal Processing - July 2017 - 17
Signal Processing - July 2017 - 18
Signal Processing - July 2017 - 19
Signal Processing - July 2017 - 20
Signal Processing - July 2017 - 21
Signal Processing - July 2017 - 22
Signal Processing - July 2017 - 23
Signal Processing - July 2017 - 24
Signal Processing - July 2017 - 25
Signal Processing - July 2017 - 26
Signal Processing - July 2017 - 27
Signal Processing - July 2017 - 28
Signal Processing - July 2017 - 29
Signal Processing - July 2017 - 30
Signal Processing - July 2017 - 31
Signal Processing - July 2017 - 32
Signal Processing - July 2017 - 33
Signal Processing - July 2017 - 34
Signal Processing - July 2017 - 35
Signal Processing - July 2017 - 36
Signal Processing - July 2017 - 37
Signal Processing - July 2017 - 38
Signal Processing - July 2017 - 39
Signal Processing - July 2017 - 40
Signal Processing - July 2017 - 41
Signal Processing - July 2017 - 42
Signal Processing - July 2017 - 43
Signal Processing - July 2017 - 44
Signal Processing - July 2017 - 45
Signal Processing - July 2017 - 46
Signal Processing - July 2017 - 47
Signal Processing - July 2017 - 48
Signal Processing - July 2017 - 49
Signal Processing - July 2017 - 50
Signal Processing - July 2017 - 51
Signal Processing - July 2017 - 52
Signal Processing - July 2017 - 53
Signal Processing - July 2017 - 54
Signal Processing - July 2017 - 55
Signal Processing - July 2017 - 56
Signal Processing - July 2017 - 57
Signal Processing - July 2017 - 58
Signal Processing - July 2017 - 59
Signal Processing - July 2017 - 60
Signal Processing - July 2017 - 61
Signal Processing - July 2017 - 62
Signal Processing - July 2017 - 63
Signal Processing - July 2017 - 64
Signal Processing - July 2017 - 65
Signal Processing - July 2017 - 66
Signal Processing - July 2017 - 67
Signal Processing - July 2017 - 68
Signal Processing - July 2017 - 69
Signal Processing - July 2017 - 70
Signal Processing - July 2017 - 71
Signal Processing - July 2017 - 72
Signal Processing - July 2017 - 73
Signal Processing - July 2017 - 74
Signal Processing - July 2017 - 75
Signal Processing - July 2017 - 76
Signal Processing - July 2017 - 77
Signal Processing - July 2017 - 78
Signal Processing - July 2017 - 79
Signal Processing - July 2017 - 80
Signal Processing - July 2017 - 81
Signal Processing - July 2017 - 82
Signal Processing - July 2017 - 83
Signal Processing - July 2017 - 84
Signal Processing - July 2017 - 85
Signal Processing - July 2017 - 86
Signal Processing - July 2017 - 87
Signal Processing - July 2017 - 88
Signal Processing - July 2017 - 89
Signal Processing - July 2017 - 90
Signal Processing - July 2017 - 91
Signal Processing - July 2017 - 92
Signal Processing - July 2017 - 93
Signal Processing - July 2017 - 94
Signal Processing - July 2017 - 95
Signal Processing - July 2017 - 96
Signal Processing - July 2017 - 97
Signal Processing - July 2017 - 98
Signal Processing - July 2017 - 99
Signal Processing - July 2017 - 100
Signal Processing - July 2017 - 101
Signal Processing - July 2017 - 102
Signal Processing - July 2017 - 103
Signal Processing - July 2017 - 104
Signal Processing - July 2017 - 105
Signal Processing - July 2017 - 106
Signal Processing - July 2017 - 107
Signal Processing - July 2017 - 108
Signal Processing - July 2017 - 109
Signal Processing - July 2017 - 110
Signal Processing - July 2017 - 111
Signal Processing - July 2017 - 112
Signal Processing - July 2017 - 113
Signal Processing - July 2017 - 114
Signal Processing - July 2017 - 115
Signal Processing - July 2017 - 116
Signal Processing - July 2017 - 117
Signal Processing - July 2017 - 118
Signal Processing - July 2017 - 119
Signal Processing - July 2017 - 120
Signal Processing - July 2017 - 121
Signal Processing - July 2017 - 122
Signal Processing - July 2017 - 123
Signal Processing - July 2017 - 124
Signal Processing - July 2017 - 125
Signal Processing - July 2017 - 126
Signal Processing - July 2017 - 127
Signal Processing - July 2017 - 128
Signal Processing - July 2017 - 129
Signal Processing - July 2017 - 130
Signal Processing - July 2017 - 131
Signal Processing - July 2017 - 132
Signal Processing - July 2017 - 133
Signal Processing - July 2017 - 134
Signal Processing - July 2017 - 135
Signal Processing - July 2017 - 136
Signal Processing - July 2017 - 137
Signal Processing - July 2017 - 138
Signal Processing - July 2017 - 139
Signal Processing - July 2017 - 140
Signal Processing - July 2017 - 141
Signal Processing - July 2017 - 142
Signal Processing - July 2017 - 143
Signal Processing - July 2017 - 144
Signal Processing - July 2017 - 145
Signal Processing - July 2017 - 146
Signal Processing - July 2017 - 147
Signal Processing - July 2017 - 148
Signal Processing - July 2017 - 149
Signal Processing - July 2017 - 150
Signal Processing - July 2017 - 151
Signal Processing - July 2017 - 152
Signal Processing - July 2017 - 153
Signal Processing - July 2017 - 154
Signal Processing - July 2017 - 155
Signal Processing - July 2017 - 156
Signal Processing - July 2017 - 157
Signal Processing - July 2017 - 158
Signal Processing - July 2017 - 159
Signal Processing - July 2017 - 160
Signal Processing - July 2017 - 161
Signal Processing - July 2017 - 162
Signal Processing - July 2017 - 163
Signal Processing - July 2017 - 164
Signal Processing - July 2017 - 165
Signal Processing - July 2017 - 166
Signal Processing - July 2017 - 167
Signal Processing - July 2017 - 168
Signal Processing - July 2017 - 169
Signal Processing - July 2017 - 170
Signal Processing - July 2017 - 171
Signal Processing - July 2017 - 172
Signal Processing - July 2017 - 173
Signal Processing - July 2017 - 174
Signal Processing - July 2017 - 175
Signal Processing - July 2017 - 176
Signal Processing - July 2017 - 177
Signal Processing - July 2017 - 178
Signal Processing - July 2017 - 179
Signal Processing - July 2017 - 180
Signal Processing - July 2017 - 181
Signal Processing - July 2017 - 182
Signal Processing - July 2017 - 183
Signal Processing - July 2017 - 184
Signal Processing - July 2017 - 185
Signal Processing - July 2017 - 186
Signal Processing - July 2017 - 187
Signal Processing - July 2017 - 188
Signal Processing - July 2017 - 189
Signal Processing - July 2017 - 190
Signal Processing - July 2017 - 191
Signal Processing - July 2017 - 192
Signal Processing - July 2017 - 193
Signal Processing - July 2017 - 194
Signal Processing - July 2017 - 195
Signal Processing - July 2017 - 196
Signal Processing - July 2017 - Cover3
Signal Processing - July 2017 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com