Signal Processing - November 2016 - 55
One basic challenge is determining the addressee of an
open 1-2 inches traveling at 65 miles/hour can be very close
utterance, since a driver (or passenger) may be speaking to
to speech level in the frequency band 0-1 kHz. Noise is not
the system, or to another person, or to a person outside the
only intrusive and reduces the drivers' concentration but also
car (e.g., on a call), or even to an automatic assistant on,
degrades the ASR performance and interrupts the dialog flow.
e.g., his or her mobile phone. A system needs to know when
While some noise conditions are quite similar across vehicle
it is being addressed, and respond only
types, some in-vehicle noise conditions
then and not in other contexts. In addivary significantly across vehicles such as
Another challenge for
tion, the interruption of conversation will
noise produced by engines, turn signals,
in-vehicle speech
generally need to put another "on hold"-
or wiper blades. Within the same vehicle,
something that people are used to doing
noise levels can vary from very low when
recognition and
with other people, but generally not with
the engine is idle and windows are closed,
understanding
systems. Addressee detection will need
to very high when traveling at high speeds
comes from imperfect
to scale to be able to suspend as well as
with open windows. Similar to other dialog
speech input.
resume conversations. It is worth noting
systems, environmental noise also impacts
that the current practice of "hot" words
how speakers speak, affecting a wide range
(wake-up words that are used to engage with a system)
of speech characteristics (including task stress, emotion, and
serves to initiate an addressee, but not to effectively susLombard effects).
pend, resume, or close these interactions.
Another challenge for in-vehicle speech recognition and
Even given the correct addressee, another challenge is
understanding comes from imperfect speech input. When
how to handle interruptions to fluency of incoming speech
drivers are under stress, their speech can be less fluent and
for the timely determination of system responses. As just
predictable. Their speech tends to contain more word fragnoted, speech contains pausing and disfluencies [34], and
ments, restarts and repairs, hesitations, and alternative phrasthe task of driving only enhances opportunities for distracings. These deviations from standard speech result in degraded
tion and coping with sudden changes in the car or surroundspeech recognition performance as a result of variation in
ing environment. Such fluency breaks cause important
acoustics and language.
ambiguities for turn-taking. For example, even a simple
For acoustic modeling, the classical cross-word modelpause to a navigation system, as in "which road do I turn
ing approach becomes less effective due to word fragmenonto (long pause) after I cross the bridge" produces differtation and hesitation. Similarly, speech endpoint detection
ent results pre- (incorrect) versus post- (correct) pause that
becomes very difficult as the engine does not know whethmatter to timely interaction. Waiting induces system latency
er a silence is a long pause or the end of a request, espeif the speaker was actually done. Work using acoustic-procially in the presence of noise. For language modeling,
sodic features of prepause speech [35], as well as incremenword fragments pose a challenge since there are often many
tal content [36], can be used to better determine whether a
proper names to cover and names can be interrupted leavuser is suspending versus finishing an utterance. Additional
ing only a fragment. The more word fragments are includchallenges exist for handling self-repairs [37] in the driver's
ed in the system, the more confusability is added, and it is
speech in real time.
harder to build a low-perplexity language model to conFuture challenges in conversational management also
strain search space.
include the system production of "conversational grounding,"
Initial attempts have been made to improve the detection of
which becomes more necessary as utterance length and comdisfluent speech [32]. As DNN-HMM and Connectionist Templexity increases. In natural conversation, partners employ
poral Classification (CTC) are overtaking GMM-HMM [18],
speech back channels such as "uh-huh," as well as visual cues
[33], DNN-based technologies are predicted to better handle
such as gaze and head nods, to convey to each other that they
disfluent speech although their potential benefits need to be
are "still listening." Mobile interfaces currently display visual
validated through real-world in-vehicle uses.
information to ground users, but as autonomy increases, audio
rather than visual grounding offers the benefit of an eyes-free
Challenges in coordinating conversations
option. Research on system-produced back channeling [38]
Beyond the challenges of word recognition itself, interactoffers promise for the future of natural interactive systems, but
ing with increasingly capable voice technologies in the car
scaling to grounding in the safety and multiconversation enviwill require more sophisticated coordination of human-
ronment requires a better understanding of how users interact
machine conversations. Drivers freed from lower-level
with system-produced grounding mechanisms in real time and
driving tasks will have more opportunity for social interacunder cognitive load.
tions both inside and (via mobile) outside the car. And the
For all of these conversational management tasks, the
systems in the car that they do communicate with will have
in-vehicle environment offers unusually rich opportunities
advanced knowledge and complexity. As a result, conversafor speaker-dependent modeling. With fewer lower-level
tion management presents challenges and new opportunidriving tasks to attend to, drivers are expected in particular
ties even if one assumes that high-quality word recognition
to vary dramatically in use of voice for reasons unrelated
is available.
IEEE SIgnal ProcESSIng MagazInE
|
November 2016
|
55
Table of Contents for the Digital Edition of Signal Processing - November 2016
Signal Processing - November 2016 - Cover1
Signal Processing - November 2016 - Cover2
Signal Processing - November 2016 - 1
Signal Processing - November 2016 - 2
Signal Processing - November 2016 - 3
Signal Processing - November 2016 - 4
Signal Processing - November 2016 - 5
Signal Processing - November 2016 - 6
Signal Processing - November 2016 - 7
Signal Processing - November 2016 - 8
Signal Processing - November 2016 - 9
Signal Processing - November 2016 - 10
Signal Processing - November 2016 - 11
Signal Processing - November 2016 - 12
Signal Processing - November 2016 - 13
Signal Processing - November 2016 - 14
Signal Processing - November 2016 - 15
Signal Processing - November 2016 - 16
Signal Processing - November 2016 - 17
Signal Processing - November 2016 - 18
Signal Processing - November 2016 - 19
Signal Processing - November 2016 - 20
Signal Processing - November 2016 - 21
Signal Processing - November 2016 - 22
Signal Processing - November 2016 - 23
Signal Processing - November 2016 - 24
Signal Processing - November 2016 - 25
Signal Processing - November 2016 - 26
Signal Processing - November 2016 - 27
Signal Processing - November 2016 - 28
Signal Processing - November 2016 - 29
Signal Processing - November 2016 - 30
Signal Processing - November 2016 - 31
Signal Processing - November 2016 - 32
Signal Processing - November 2016 - 33
Signal Processing - November 2016 - 34
Signal Processing - November 2016 - 35
Signal Processing - November 2016 - 36
Signal Processing - November 2016 - 37
Signal Processing - November 2016 - 38
Signal Processing - November 2016 - 39
Signal Processing - November 2016 - 40
Signal Processing - November 2016 - 41
Signal Processing - November 2016 - 42
Signal Processing - November 2016 - 43
Signal Processing - November 2016 - 44
Signal Processing - November 2016 - 45
Signal Processing - November 2016 - 46
Signal Processing - November 2016 - 47
Signal Processing - November 2016 - 48
Signal Processing - November 2016 - 49
Signal Processing - November 2016 - 50
Signal Processing - November 2016 - 51
Signal Processing - November 2016 - 52
Signal Processing - November 2016 - 53
Signal Processing - November 2016 - 54
Signal Processing - November 2016 - 55
Signal Processing - November 2016 - 56
Signal Processing - November 2016 - 57
Signal Processing - November 2016 - 58
Signal Processing - November 2016 - 59
Signal Processing - November 2016 - 60
Signal Processing - November 2016 - 61
Signal Processing - November 2016 - 62
Signal Processing - November 2016 - 63
Signal Processing - November 2016 - 64
Signal Processing - November 2016 - 65
Signal Processing - November 2016 - 66
Signal Processing - November 2016 - 67
Signal Processing - November 2016 - 68
Signal Processing - November 2016 - 69
Signal Processing - November 2016 - 70
Signal Processing - November 2016 - 71
Signal Processing - November 2016 - 72
Signal Processing - November 2016 - 73
Signal Processing - November 2016 - 74
Signal Processing - November 2016 - 75
Signal Processing - November 2016 - 76
Signal Processing - November 2016 - 77
Signal Processing - November 2016 - 78
Signal Processing - November 2016 - 79
Signal Processing - November 2016 - 80
Signal Processing - November 2016 - 81
Signal Processing - November 2016 - 82
Signal Processing - November 2016 - 83
Signal Processing - November 2016 - 84
Signal Processing - November 2016 - 85
Signal Processing - November 2016 - 86
Signal Processing - November 2016 - 87
Signal Processing - November 2016 - 88
Signal Processing - November 2016 - 89
Signal Processing - November 2016 - 90
Signal Processing - November 2016 - 91
Signal Processing - November 2016 - 92
Signal Processing - November 2016 - 93
Signal Processing - November 2016 - 94
Signal Processing - November 2016 - 95
Signal Processing - November 2016 - 96
Signal Processing - November 2016 - 97
Signal Processing - November 2016 - 98
Signal Processing - November 2016 - 99
Signal Processing - November 2016 - 100
Signal Processing - November 2016 - 101
Signal Processing - November 2016 - 102
Signal Processing - November 2016 - 103
Signal Processing - November 2016 - 104
Signal Processing - November 2016 - 105
Signal Processing - November 2016 - 106
Signal Processing - November 2016 - 107
Signal Processing - November 2016 - 108
Signal Processing - November 2016 - 109
Signal Processing - November 2016 - 110
Signal Processing - November 2016 - 111
Signal Processing - November 2016 - 112
Signal Processing - November 2016 - 113
Signal Processing - November 2016 - 114
Signal Processing - November 2016 - 115
Signal Processing - November 2016 - 116
Signal Processing - November 2016 - 117
Signal Processing - November 2016 - 118
Signal Processing - November 2016 - 119
Signal Processing - November 2016 - 120
Signal Processing - November 2016 - 121
Signal Processing - November 2016 - 122
Signal Processing - November 2016 - 123
Signal Processing - November 2016 - 124
Signal Processing - November 2016 - 125
Signal Processing - November 2016 - 126
Signal Processing - November 2016 - 127
Signal Processing - November 2016 - 128
Signal Processing - November 2016 - 129
Signal Processing - November 2016 - 130
Signal Processing - November 2016 - 131
Signal Processing - November 2016 - 132
Signal Processing - November 2016 - 133
Signal Processing - November 2016 - 134
Signal Processing - November 2016 - 135
Signal Processing - November 2016 - 136
Signal Processing - November 2016 - 137
Signal Processing - November 2016 - 138
Signal Processing - November 2016 - 139
Signal Processing - November 2016 - 140
Signal Processing - November 2016 - 141
Signal Processing - November 2016 - 142
Signal Processing - November 2016 - 143
Signal Processing - November 2016 - 144
Signal Processing - November 2016 - 145
Signal Processing - November 2016 - 146
Signal Processing - November 2016 - 147
Signal Processing - November 2016 - 148
Signal Processing - November 2016 - Cover3
Signal Processing - November 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com