Signal Processing - July 2017 - 127

Björn Schuller (schuller@ieee.org) received his diploma,
doctoral degree, and habilitation degree in electrical engineering
and information technology from the Technische Universität
München, Germany in 1999, 2006, and 2012, respectively. He is
a reader in machine learning in the Department of Computing at
Imperial College, London, United Kingdom, and a full professor and head of the Chair of Complex and Intelligent Systems,
University of Passau, Germany, where he previously headed the
Chair of Sensor Systems. He is a Senior Member of the IEEE.

References

[1] D. O'Shaughnessy, Speech Communications: Human and Machine, 2nd ed.
Piscataway, NJ: IEEE Press, 2000.
[2] F. Weng, P. Angkititrakul, E. E. Shriberg, L. Heck, S. Peters, and J. H. L.
Hansen, "Conversational in-vehicle dialog systems: The past, present, and future,"
IEEE Signal Process. Mag., vol. 33, no. 6, pp. 49-60, Nov. 2016.
[3] B. W. Schuller, "The computational paralinguistics challenge," IEEE Signal
Process. Mag., vol. 29, no. 4, pp. 97-101, July 2012.
[4] C. Moseley, Atlas of the World's Languages in Danger, 3rd ed. Paris: Unesco
Publishing, 2010.
[5] A. Halevy, P. Norvig, and F. Pereira, "The unreasonable effectiveness of data,"
IEEE Intell. Syst., vol. 24, no. 2, pp. 8-12, Mar. 2009.
[6] L. Deng and D. Yu, "Deep learning: Methods and applications," Foundations
and Trends in Signal Process., vol. 7, no. 3-4, pp. 197-387, June 2014.
[7] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J.
Chen, M. Chrzanowski, et al. "Deep speech 2: End-to-end speech recognition in
english and mandarin," in Proc. Int. Conf. Machine Learning (ICML), New York,
2016, pp. 173-182.
[8] L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition.
Englewood Cliffs, NJ: Prentice Hall, 1993.
[9] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE/ACM Trans. Audio, Speech, Language
Process., vol. 19, no. 4, pp. 788-798, May 2011.
[10] F. Eyben, F. Weninger, F. Gross, and B. Schuller, "Recent developments in
openSMILE, the Munich open-source multimedia feature extractor," in Proc. 21st
ACM Int. Conf. Multimedia, Barcelona, Spain, 2013, pp. 835-838.
[11] M. Harper. IARPA Babel program. Intelligence advanced research projects
activity, Office of the Director of National Intelligence, Washington, D.C. [Online].
Available: https://www.iarpa.gov/index.php/research-programs/babel
[12] B. W. Schuller, "Speech analysis in the big data era," in Text, Speech, and
Dialogue (Lecture Notes in Computer Science, vol. 9302), P. Král and
V. Matoušek, Eds. Berlin: Springer-Verlag, 2015, pp. 3-11.
[13] M. Versteegh, R. Thiollière, T. Schatz, X.-N. Cao, X. Anguera, A. Jansen, and
E. Dupoux, "The zero resource speech challenge 2015," in Proc. INTERSPEECH,
Dresden, Germany, 2015, pp. 3169-3173.
[14] M. R. Robertson. (2015, Nov. 13). 500 hours of video uploaded to YouTube
every minute. Tubular Insights. [Online]. Available: http://www.reelseo.com/
hours-minute-uploaded-youtube
[15] M. Eskénazi, G.-A. Levow, H. Meng, G. Parent, and D. Suendermann,
Crowdsourcing for Speech Processing: Applications to Data Collection,
Transcription and Assessment. Hoboken, NJ: Wiley, 2013.
[16] J. D. Williams, I. D. Melamed, T. Alonso, B. Hollister, and J. Wilpon, "Crowdsourcing for difficult transcription of speech," in Proc. IEEE Workshop on Automatic
Speech Recognition and Understanding (ASRU), Waikoloa, HI, 2011, pp. 535-540.
[17] Z. Zhang, E. Coutinho, J. Deng, and B. Schuller, "Cooperative learning and its
application to emotion recognition from speech," IEEE Trans. Audio, Speech,
Language Process., vol. 23, no. 1, pp. 115-126, Jan. 2015.
[18] Z. Zhang, Semi-Autonomous Data Enrichment and Optimisation for
Intelligent Speech Analysis. Munich, Germany: Verlag Dr. Hut, 2015.
[19] A. Nagórski, L. Boves, and H. J. Steeneken, "Optimal selection of speech data
for automatic speech recognition systems," in Proc. INTERSPEECH, Denver, CO,
2002, pp. 2473-2476.
[20] D. Wang and T. F. Zheng, "Transfer learning for speech and language processing," in Proc. Asia-Pacific Signal and Information Processing Assoc. Annu.
Summit and Conf. (APSIPA), Hong Kong, China, 2015, pp. 1225-1237.

[23] R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng, "Cheap and fast-But is it
good? Evaluating non-expert annotations for natural language tasks," in Proc.
Conf. Empirical Methods Natural Language Processing (EMNLP), Honolulu, HI,
2008, pp. 254-263.
[24] S. Novotney and C. Callison-Burch, "Cheap, fast and good enough: Automatic
speech recognition with non-expert transcription," in Proc. Human Language
Technologies: 2010 Annu. Conf. North American Chapter Assoc. Computational
Linguistics, Los Angeles, 2010, pp. 207-215.
[25] S. Hantke, T. Appel, F. Eyben, and B. Schuller, "iHEARu-PLAY: Introducing
a game for crowdsourced data collection for affective computing," in Proc. Int.
Conf. Affective Computing and Intelligent Interaction (ACII), Xi'an, China, 2015,
pp. 891-897.
[26] A. Jansen, E. Dupoux, S. Goldwater, M. Johnson, S. Khudanpur, K. Church,
N. Feldman, et al. "A summary of the 2012 JHU CLSP workshop on zero resource
speech technologies and models of early language acquisition," in Proc. IEEE Int.
Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada,
2013, pp. 8111-8115.
[27] G. Parent and M. Eskenazi, "Toward better crowdsourced transcription:
Transcription of a year of the Let's Go Bus Information System data," in Proc.
IEEE Spoken Language Technology Workshop (SLT), Berkeley, CA, 2010,
pp. 312-317.
[28] A. Tarasov, S. J. Delany, and C. Cullen, "Using crowdsourcing for labelling
emotional speech assets," in Proc. W3C workshop on Emotion Markup Language
(EmotionML), Paris, 2010, pp. 1-5.
[29] J. Ledlie, B. Odero, E. Minkov, I. Kiss, and J. Polifroni, "Crowd translator: On
building localized speech recognizers through micropayments," ACM SIGOPS
Operating Syst. Rev., vol. 43, no. 4, pp. 84-89, Jan. 2010.
[30] C-Y. Lee and J. R. Glass, "A transcription task for crowdsourcing with automatic quality control," in Proc. INTERSPEECH, Florence, Italy, 2011, pp. 3041-
3044.
[31] A. S. Park and J. R. Glass, "Unsupervised pattern discovery in speech," IEEE
Trans. Audio, Speech, Language Process., vol. 16, no. 1, pp. 186-197, Jan. 2008.
[32] K. Levin, K. Henry, A. Jansen, and K. Livescu, "Fixed-dimensional acoustic
embeddings of variable-length segments in low-resource settings," in Proc. IEEE
Workshop Automatic Speech Recognition and Understanding (ASRU), Olomouc,
Czech Republic, 2013, pp. 410-415.
[33] H. Wang, T. Lee, C. C. Leung, B. Ma, and H. Li, "Using parallel tokenizers
with DTW matrix combination for low-resource spoken term detection," in Proc.
IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver,
Canada, 2013, pp. 8545-8549.
[34] G. Mantena and X. Anguera, "Speed improvements to information retrievalbased dynamic time warping using hierarchical k-means clustering," in Proc.
IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vancouver,
Canada, 2013, pp. 8515-8519.
[35] X. Anguera, "Method and system for improved pattern matching," EP Patent
EP12 382 508, 2012.
[36] Y. Chung, C. Wu, C. Shen, H. Lee, and L. Lee, "Audio Word2Vec: Unsupervised
learning of audio segment representations using sequence-to-sequence autoencoder,"
in Proc. INTERSPEECH, San Francisco, CA, 2016, pp. 765-769.
[37] H. Kamper, A. Jansen, and S. Goldwater, "Unsupervised word segmentation
and lexicon discovery using acoustic word embeddings," IEEE/ACM Trans. Audio,
Speech, Language Process., vol. 24, no. 4, pp. 669-679, Apr. 2016.
[38] Y. Zhang and J. R. Glass, "Towards multi-speaker unsupervised speech pattern discovery," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing
(ICASSP), Dallas, TX, 2010, pp. 4366-4369.
[39] C. Weng, D. Yu, S. Watanabe, and B.-H. F. Juang, "Recurrent deep neural networks for robust speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech
and Signal Processing (ICASSP), Florence, Italy, 2014, pp. 5532-5536.
[40] N. Jaitly and G. E. Hinton, "Vocal tract length perturbation (VTLP) improves
speech recognition," in Proc. ICML Workshop on Deep Learning for Audio, Speech
and Language, Atlanta, GA, 2013.
[41] X. Cui, V. Goel, and B. Kingsbury, "Data augmentation for deep neural network acoustic modeling," IEEE/ACM Trans. Audio, Speech Language Process.,
vol. 23, no. 9, pp. 1469-1477, Sept. 2015.
[42] Z. Tüske, P. Golik, D. Nolden, R. Schlüter, and H. Ney, "Data augmentation,
feature combination, and multilingual neural networks to improve ASR and KWS
performance for low-resource languages," in Proc. INTERSPEECH, Singapore,
2014, pp. 1420-1424.
[43] T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, "Audio augmentation for speech
recognition," in Proc. INTERSPEECH, Dresden, Germany, 2015, pp. 3586-3589.

[21] S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Trans. Knowl.
Data Eng., vol. 22, no. 10, pp. 1345-1359, Oct. 2010.

[44] V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," in Proc. INTERSPEECH,
Dresden, Germany, 2015, pp. 3214-3218.

[22] L. Deng and X. Li, "Machine learning paradigms for speech recognition: An
overview," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 5, pp.
1060-1089, May 2013.

[45] B. Milde and C. Biemann, "Using representation learning and out-of-domain
data for a paralinguistic speech task," in Proc. INTERSPEECH, Dresden, Germany,
2015, pp. 904-908.

IEEE SIGNAL PROCESSING MAGAZINE

|

July 2017

|

127


https://www.iarpa.gov/index.php/research-programs/babel http://www.reelseo.com/

Table of Contents for the Digital Edition of Signal Processing - July 2017

Signal Processing - July 2017 - Cover1
Signal Processing - July 2017 - Cover2
Signal Processing - July 2017 - 1
Signal Processing - July 2017 - 2
Signal Processing - July 2017 - 3
Signal Processing - July 2017 - 4
Signal Processing - July 2017 - 5
Signal Processing - July 2017 - 6
Signal Processing - July 2017 - 7
Signal Processing - July 2017 - 8
Signal Processing - July 2017 - 9
Signal Processing - July 2017 - 10
Signal Processing - July 2017 - 11
Signal Processing - July 2017 - 12
Signal Processing - July 2017 - 13
Signal Processing - July 2017 - 14
Signal Processing - July 2017 - 15
Signal Processing - July 2017 - 16
Signal Processing - July 2017 - 17
Signal Processing - July 2017 - 18
Signal Processing - July 2017 - 19
Signal Processing - July 2017 - 20
Signal Processing - July 2017 - 21
Signal Processing - July 2017 - 22
Signal Processing - July 2017 - 23
Signal Processing - July 2017 - 24
Signal Processing - July 2017 - 25
Signal Processing - July 2017 - 26
Signal Processing - July 2017 - 27
Signal Processing - July 2017 - 28
Signal Processing - July 2017 - 29
Signal Processing - July 2017 - 30
Signal Processing - July 2017 - 31
Signal Processing - July 2017 - 32
Signal Processing - July 2017 - 33
Signal Processing - July 2017 - 34
Signal Processing - July 2017 - 35
Signal Processing - July 2017 - 36
Signal Processing - July 2017 - 37
Signal Processing - July 2017 - 38
Signal Processing - July 2017 - 39
Signal Processing - July 2017 - 40
Signal Processing - July 2017 - 41
Signal Processing - July 2017 - 42
Signal Processing - July 2017 - 43
Signal Processing - July 2017 - 44
Signal Processing - July 2017 - 45
Signal Processing - July 2017 - 46
Signal Processing - July 2017 - 47
Signal Processing - July 2017 - 48
Signal Processing - July 2017 - 49
Signal Processing - July 2017 - 50
Signal Processing - July 2017 - 51
Signal Processing - July 2017 - 52
Signal Processing - July 2017 - 53
Signal Processing - July 2017 - 54
Signal Processing - July 2017 - 55
Signal Processing - July 2017 - 56
Signal Processing - July 2017 - 57
Signal Processing - July 2017 - 58
Signal Processing - July 2017 - 59
Signal Processing - July 2017 - 60
Signal Processing - July 2017 - 61
Signal Processing - July 2017 - 62
Signal Processing - July 2017 - 63
Signal Processing - July 2017 - 64
Signal Processing - July 2017 - 65
Signal Processing - July 2017 - 66
Signal Processing - July 2017 - 67
Signal Processing - July 2017 - 68
Signal Processing - July 2017 - 69
Signal Processing - July 2017 - 70
Signal Processing - July 2017 - 71
Signal Processing - July 2017 - 72
Signal Processing - July 2017 - 73
Signal Processing - July 2017 - 74
Signal Processing - July 2017 - 75
Signal Processing - July 2017 - 76
Signal Processing - July 2017 - 77
Signal Processing - July 2017 - 78
Signal Processing - July 2017 - 79
Signal Processing - July 2017 - 80
Signal Processing - July 2017 - 81
Signal Processing - July 2017 - 82
Signal Processing - July 2017 - 83
Signal Processing - July 2017 - 84
Signal Processing - July 2017 - 85
Signal Processing - July 2017 - 86
Signal Processing - July 2017 - 87
Signal Processing - July 2017 - 88
Signal Processing - July 2017 - 89
Signal Processing - July 2017 - 90
Signal Processing - July 2017 - 91
Signal Processing - July 2017 - 92
Signal Processing - July 2017 - 93
Signal Processing - July 2017 - 94
Signal Processing - July 2017 - 95
Signal Processing - July 2017 - 96
Signal Processing - July 2017 - 97
Signal Processing - July 2017 - 98
Signal Processing - July 2017 - 99
Signal Processing - July 2017 - 100
Signal Processing - July 2017 - 101
Signal Processing - July 2017 - 102
Signal Processing - July 2017 - 103
Signal Processing - July 2017 - 104
Signal Processing - July 2017 - 105
Signal Processing - July 2017 - 106
Signal Processing - July 2017 - 107
Signal Processing - July 2017 - 108
Signal Processing - July 2017 - 109
Signal Processing - July 2017 - 110
Signal Processing - July 2017 - 111
Signal Processing - July 2017 - 112
Signal Processing - July 2017 - 113
Signal Processing - July 2017 - 114
Signal Processing - July 2017 - 115
Signal Processing - July 2017 - 116
Signal Processing - July 2017 - 117
Signal Processing - July 2017 - 118
Signal Processing - July 2017 - 119
Signal Processing - July 2017 - 120
Signal Processing - July 2017 - 121
Signal Processing - July 2017 - 122
Signal Processing - July 2017 - 123
Signal Processing - July 2017 - 124
Signal Processing - July 2017 - 125
Signal Processing - July 2017 - 126
Signal Processing - July 2017 - 127
Signal Processing - July 2017 - 128
Signal Processing - July 2017 - 129
Signal Processing - July 2017 - 130
Signal Processing - July 2017 - 131
Signal Processing - July 2017 - 132
Signal Processing - July 2017 - 133
Signal Processing - July 2017 - 134
Signal Processing - July 2017 - 135
Signal Processing - July 2017 - 136
Signal Processing - July 2017 - 137
Signal Processing - July 2017 - 138
Signal Processing - July 2017 - 139
Signal Processing - July 2017 - 140
Signal Processing - July 2017 - 141
Signal Processing - July 2017 - 142
Signal Processing - July 2017 - 143
Signal Processing - July 2017 - 144
Signal Processing - July 2017 - 145
Signal Processing - July 2017 - 146
Signal Processing - July 2017 - 147
Signal Processing - July 2017 - 148
Signal Processing - July 2017 - 149
Signal Processing - July 2017 - 150
Signal Processing - July 2017 - 151
Signal Processing - July 2017 - 152
Signal Processing - July 2017 - 153
Signal Processing - July 2017 - 154
Signal Processing - July 2017 - 155
Signal Processing - July 2017 - 156
Signal Processing - July 2017 - 157
Signal Processing - July 2017 - 158
Signal Processing - July 2017 - 159
Signal Processing - July 2017 - 160
Signal Processing - July 2017 - 161
Signal Processing - July 2017 - 162
Signal Processing - July 2017 - 163
Signal Processing - July 2017 - 164
Signal Processing - July 2017 - 165
Signal Processing - July 2017 - 166
Signal Processing - July 2017 - 167
Signal Processing - July 2017 - 168
Signal Processing - July 2017 - 169
Signal Processing - July 2017 - 170
Signal Processing - July 2017 - 171
Signal Processing - July 2017 - 172
Signal Processing - July 2017 - 173
Signal Processing - July 2017 - 174
Signal Processing - July 2017 - 175
Signal Processing - July 2017 - 176
Signal Processing - July 2017 - 177
Signal Processing - July 2017 - 178
Signal Processing - July 2017 - 179
Signal Processing - July 2017 - 180
Signal Processing - July 2017 - 181
Signal Processing - July 2017 - 182
Signal Processing - July 2017 - 183
Signal Processing - July 2017 - 184
Signal Processing - July 2017 - 185
Signal Processing - July 2017 - 186
Signal Processing - July 2017 - 187
Signal Processing - July 2017 - 188
Signal Processing - July 2017 - 189
Signal Processing - July 2017 - 190
Signal Processing - July 2017 - 191
Signal Processing - July 2017 - 192
Signal Processing - July 2017 - 193
Signal Processing - July 2017 - 194
Signal Processing - July 2017 - 195
Signal Processing - July 2017 - 196
Signal Processing - July 2017 - Cover3
Signal Processing - July 2017 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com