IEEE Signal Processing - May 2018 - 98

Introduction
Speech coding is an essential technology in information transmission and communication systems. Human cognitive processing operates at about 50 bits/s, which corresponds roughly
to the speech production semantics as the rate of phonemic
information in speech (e.g., most languages have approximately 32 phonemes encoded with 5 bits, and 1 s of speech
has perhaps ten phonemes), and the sensory system is known
to encode nonredundant structures [1]. Efficient coding maximizes the amount of information conveyed about the sensory
signal to the rest of the brain. The incoming acoustic signal is
transmitted mechanically to the inner ear and undergoes a
highly complex transformation before it is encoded efficiently
by spikes at the auditory nerve. This great efficiency in information representation has inspired speech engineers to incorporate aspects of cognitive processing when developing efficient
speech technologies.
Speech coding is a field in which research has slowed considerably in recent years. This has occurred not because it has
achieved the ultimate in minimizing the bit rate for transparent speech quality, but because recent improvements have been
small and commercial applications (e.g., cell phones) have been
mostly satisfactory for the general public-by the same token,
the growth of available bandwidth has reduced requirements to
compress speech even further. However, better compression is
always desirable (e.g., in large archival systems).
Historically, the mechanisms of speech perception have led
to perceptual speech coding [2], primarily suitable for digital
audio. Substantial progress in this context incorporates mechanisms to optimize coder performance for the human ear in

the context of subband (transform) coders. On the other hand,
the most common speech coding for medium-to-low bit rates
is based on models of human speech production, realized as
linear predictive vocoders, and analysis-by-synthesis linear
predictive coders. Unified audio and speech coding is usually
realized with realtime switching according to the input signal
type. For example, the Enhanced Voice Services (EVS) coder
standardized in 2015 by the 3rd Generation Partnership Project
(3GPP) offers new features and improvements for low-delay,
real-time communication systems, higher quality for clean/
noisy speech, mixed content, and music, including support for
wideband, superwideband, and fullband content [3]. However,
the core speech compression method is algebraic code-excited
linear prediction (ACELP) proposed in 1987 by Adoul et al.
[4]. Thus, the compression paradigm has not changed significantly in the last 30 years and has its roots in linear predictive
theory dating back to the early 1940s [5], [6].
Research studies conducted in the last decade have incorporated additional aspects of speech processing, i.e., the
functional and temporal organization of human speech and
language processing. Figure 1 shows the overall speech perception process. Speech signals are treated indiscriminately
by subcortical structures from other types of acoustic input.
Such processes have now been fairly well characterized,
specifically the cochlea, which is decomposed into different
frequency channels forming the auditory spectrogram [7].
Automatic speech processing is derived much less from the
subsequent stages in the auditory cortex and, in particular,
omits how this continuous representation is transformed into a
discrete representation (i.e., a lexicon). Such a transformation

Speech Sound

Auditory Spectogram

Cochlea

- Continuous Signal
- Speaker Sensitive
- Speech-Rate Sensitive
Auditory Cortex
* Speech perception must deal with different
timescales corresponding to phonemic, syllabic,
and prosodic temporal modulations.

Lexicon
/k/ə/m/h/o/m/r/a/j/t/ə/w/e/
- Discrete Representation
- Speaker Insensitive
- Speech-Rate Insensitive

Figure 1. The overall speech perception process carried out by the peripheral (cochlea) and the central auditory (primary auditory cortex) systems. The
green and blue boxes with corresponding vertically spaced lines represent syllabic and phonetic speech segmentation, respectively.

98

IEEE Signal Processing Magazine

|

May 2018

|



Table of Contents for the Digital Edition of IEEE Signal Processing - May 2018

Contents
IEEE Signal Processing - May 2018 - Cover1
IEEE Signal Processing - May 2018 - Cover2
IEEE Signal Processing - May 2018 - Contents
IEEE Signal Processing - May 2018 - 2
IEEE Signal Processing - May 2018 - 3
IEEE Signal Processing - May 2018 - 4
IEEE Signal Processing - May 2018 - 5
IEEE Signal Processing - May 2018 - 6
IEEE Signal Processing - May 2018 - 7
IEEE Signal Processing - May 2018 - 8
IEEE Signal Processing - May 2018 - 9
IEEE Signal Processing - May 2018 - 10
IEEE Signal Processing - May 2018 - 11
IEEE Signal Processing - May 2018 - 12
IEEE Signal Processing - May 2018 - 13
IEEE Signal Processing - May 2018 - 14
IEEE Signal Processing - May 2018 - 15
IEEE Signal Processing - May 2018 - 16
IEEE Signal Processing - May 2018 - 17
IEEE Signal Processing - May 2018 - 18
IEEE Signal Processing - May 2018 - 19
IEEE Signal Processing - May 2018 - 20
IEEE Signal Processing - May 2018 - 21
IEEE Signal Processing - May 2018 - 22
IEEE Signal Processing - May 2018 - 23
IEEE Signal Processing - May 2018 - 24
IEEE Signal Processing - May 2018 - 25
IEEE Signal Processing - May 2018 - 26
IEEE Signal Processing - May 2018 - 27
IEEE Signal Processing - May 2018 - 28
IEEE Signal Processing - May 2018 - 29
IEEE Signal Processing - May 2018 - 30
IEEE Signal Processing - May 2018 - 31
IEEE Signal Processing - May 2018 - 32
IEEE Signal Processing - May 2018 - 33
IEEE Signal Processing - May 2018 - 34
IEEE Signal Processing - May 2018 - 35
IEEE Signal Processing - May 2018 - 36
IEEE Signal Processing - May 2018 - 37
IEEE Signal Processing - May 2018 - 38
IEEE Signal Processing - May 2018 - 39
IEEE Signal Processing - May 2018 - 40
IEEE Signal Processing - May 2018 - 41
IEEE Signal Processing - May 2018 - 42
IEEE Signal Processing - May 2018 - 43
IEEE Signal Processing - May 2018 - 44
IEEE Signal Processing - May 2018 - 45
IEEE Signal Processing - May 2018 - 46
IEEE Signal Processing - May 2018 - 47
IEEE Signal Processing - May 2018 - 48
IEEE Signal Processing - May 2018 - 49
IEEE Signal Processing - May 2018 - 50
IEEE Signal Processing - May 2018 - 51
IEEE Signal Processing - May 2018 - 52
IEEE Signal Processing - May 2018 - 53
IEEE Signal Processing - May 2018 - 54
IEEE Signal Processing - May 2018 - 55
IEEE Signal Processing - May 2018 - 56
IEEE Signal Processing - May 2018 - 57
IEEE Signal Processing - May 2018 - 58
IEEE Signal Processing - May 2018 - 59
IEEE Signal Processing - May 2018 - 60
IEEE Signal Processing - May 2018 - 61
IEEE Signal Processing - May 2018 - 62
IEEE Signal Processing - May 2018 - 63
IEEE Signal Processing - May 2018 - 64
IEEE Signal Processing - May 2018 - 65
IEEE Signal Processing - May 2018 - 66
IEEE Signal Processing - May 2018 - 67
IEEE Signal Processing - May 2018 - 68
IEEE Signal Processing - May 2018 - 69
IEEE Signal Processing - May 2018 - 70
IEEE Signal Processing - May 2018 - 71
IEEE Signal Processing - May 2018 - 72
IEEE Signal Processing - May 2018 - 73
IEEE Signal Processing - May 2018 - 74
IEEE Signal Processing - May 2018 - 75
IEEE Signal Processing - May 2018 - 76
IEEE Signal Processing - May 2018 - 77
IEEE Signal Processing - May 2018 - 78
IEEE Signal Processing - May 2018 - 79
IEEE Signal Processing - May 2018 - 80
IEEE Signal Processing - May 2018 - 81
IEEE Signal Processing - May 2018 - 82
IEEE Signal Processing - May 2018 - 83
IEEE Signal Processing - May 2018 - 84
IEEE Signal Processing - May 2018 - 85
IEEE Signal Processing - May 2018 - 86
IEEE Signal Processing - May 2018 - 87
IEEE Signal Processing - May 2018 - 88
IEEE Signal Processing - May 2018 - 89
IEEE Signal Processing - May 2018 - 90
IEEE Signal Processing - May 2018 - 91
IEEE Signal Processing - May 2018 - 92
IEEE Signal Processing - May 2018 - 93
IEEE Signal Processing - May 2018 - 94
IEEE Signal Processing - May 2018 - 95
IEEE Signal Processing - May 2018 - 96
IEEE Signal Processing - May 2018 - 97
IEEE Signal Processing - May 2018 - 98
IEEE Signal Processing - May 2018 - 99
IEEE Signal Processing - May 2018 - 100
IEEE Signal Processing - May 2018 - 101
IEEE Signal Processing - May 2018 - 102
IEEE Signal Processing - May 2018 - 103
IEEE Signal Processing - May 2018 - 104
IEEE Signal Processing - May 2018 - 105
IEEE Signal Processing - May 2018 - 106
IEEE Signal Processing - May 2018 - 107
IEEE Signal Processing - May 2018 - 108
IEEE Signal Processing - May 2018 - 109
IEEE Signal Processing - May 2018 - 110
IEEE Signal Processing - May 2018 - 111
IEEE Signal Processing - May 2018 - 112
IEEE Signal Processing - May 2018 - 113
IEEE Signal Processing - May 2018 - 114
IEEE Signal Processing - May 2018 - 115
IEEE Signal Processing - May 2018 - 116
IEEE Signal Processing - May 2018 - 117
IEEE Signal Processing - May 2018 - 118
IEEE Signal Processing - May 2018 - 119
IEEE Signal Processing - May 2018 - 120
IEEE Signal Processing - May 2018 - 121
IEEE Signal Processing - May 2018 - 122
IEEE Signal Processing - May 2018 - 123
IEEE Signal Processing - May 2018 - 124
IEEE Signal Processing - May 2018 - 125
IEEE Signal Processing - May 2018 - 126
IEEE Signal Processing - May 2018 - 127
IEEE Signal Processing - May 2018 - 128
IEEE Signal Processing - May 2018 - 129
IEEE Signal Processing - May 2018 - 130
IEEE Signal Processing - May 2018 - 131
IEEE Signal Processing - May 2018 - 132
IEEE Signal Processing - May 2018 - Cover3
IEEE Signal Processing - May 2018 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com