Signal Processing - November 2016 - 52
Google Now (2012)
Amazon's Echo (2014)
- Information Query
- User Profiling
- Two-Way Dialogue
(Google Assistant)
-...
Siri
Google
Now
Apple's Siri (2011)
- Question/Answer
- Recommendations
- Device Control
-...
- Music Playback
- Real-Time Information
- Device Control
-...
Cortana
Echo
Microsoft's Cortana (2014)
Ongoing
In Development
- Information Query
- User Profiling
- Music Recognition
-...
- Facebook's M
- X.ai's Amy Ingram
- SoundHound's Hound
- Viv
-...
FIGURE 2. The development of voice-enabled IPAs.
Markov with Gaussian mixture model (GMM-HMM)-based
a number of spin-offs, most notably SRI International's Siri
approaches in speech recognition performance. With powerintelligent software assistant.
ful computing resources and advances in learning algorithms
Nuance speech technologies are behind many successful
beyond early approaches [17], [18], hidden Markov models with
products in different markets. Due to limitations in speech
DNN (DNN-HMM)-based approaches are able to reduce error
understanding accuracy and coverage, early Nuance speech
rates in speech recognition by 50% [18]. Various studies have
dialog systems in call-center applications strictly followed a
been conducted on noise robustness with DNN. Some focus
visual menu design, rather than typical human interaction paton using environmental information via feature enhancement
terns, leading to low user satisfaction in many cases.
[19]-[22]. Others include different conditions, such as model
After the acquisition of Siri in 2011, Apple's launch of
noise, reverberation, or speaker variation, in an end-to-end
the IPA on the iPhone marked a turning point in the mass
speech recognition training with a recurrent neural network
acceptance of speech technologies. With the massive com[23]. The latter contrasts with traditional approaches of sepaputational power available through the cloud, more applicarating ASR optimization into front-end signal processing and
tions and AI technologies started to be integrated into dialog
back-end model training. Effective front-end approaches have
systems; see Figure 2. Consequently, IPAs were developed
been developed to address acoustic echo,
that allow users to operate devices, access
background noises, and multiple sound
information, and manage personal tasks in
The rapid increase of
sources with microphone array processing
a much richer way. Notable IPAs include
high-quality cloud-based
[24]-[26]. The improved recognition accuApple Siri, Google Assistant, Microsoft
IPAs has been partly
racy significantly benefits the usability and
Cortana, Amazon Echo, IBM Watson, and
adoption of the general IPAs.
Baidu. In addition, the major players in the
attributed to recent
The automotive industry has taken two
field are building up developer platforms
advances in deep learning
different approaches to address the arrival
around these IPAs to enable their own AI
technologies-especially
of cloud-based, voice-enabled assistance
ecosystems. However, these systems typideep neural networks.
systems from major IT companies. The
cally do not have much dialog capability
shallow integration approach leverages inand, in most cases, focus on single-turn
vehicle microphones, loudspeakers, control buttons, or a head
question-answers (Q&A) and simple actions. As a contrast,
unit screen to enable mobile devices to synchronize the systext-based chatbots from Facebook, Google, and others make
tems' look and feel. This approach is represented by the Miruse of dialog technologies in automating services via MessenrorLink standard from Car Connectivity Consortium, Apple's
ger, bypassing the dependency on speech technology.
CarPlay, and Google's Android Auto. The deep integration
The rapid increase of high-quality cloud-based IPAs has
approach requires the integration of the embedded and cloud
been partly attributed to recent advances in deep learning techsystems at a component level. Recent experiments have shown
nologies-especially deep neural networks (DNNs) [16]. With
that combining ASR results from embedded and cloud-based
the exception of speaker recognition in the late 1990s [17],
engines can reduce word error rate by up to almost 30% for
deep learning methods have only recently surpassed hidden
52
IEEE SIgnal ProcESSIng MagazInE
|
November 2016
|
Table of Contents for the Digital Edition of Signal Processing - November 2016
Signal Processing - November 2016 - Cover1
Signal Processing - November 2016 - Cover2
Signal Processing - November 2016 - 1
Signal Processing - November 2016 - 2
Signal Processing - November 2016 - 3
Signal Processing - November 2016 - 4
Signal Processing - November 2016 - 5
Signal Processing - November 2016 - 6
Signal Processing - November 2016 - 7
Signal Processing - November 2016 - 8
Signal Processing - November 2016 - 9
Signal Processing - November 2016 - 10
Signal Processing - November 2016 - 11
Signal Processing - November 2016 - 12
Signal Processing - November 2016 - 13
Signal Processing - November 2016 - 14
Signal Processing - November 2016 - 15
Signal Processing - November 2016 - 16
Signal Processing - November 2016 - 17
Signal Processing - November 2016 - 18
Signal Processing - November 2016 - 19
Signal Processing - November 2016 - 20
Signal Processing - November 2016 - 21
Signal Processing - November 2016 - 22
Signal Processing - November 2016 - 23
Signal Processing - November 2016 - 24
Signal Processing - November 2016 - 25
Signal Processing - November 2016 - 26
Signal Processing - November 2016 - 27
Signal Processing - November 2016 - 28
Signal Processing - November 2016 - 29
Signal Processing - November 2016 - 30
Signal Processing - November 2016 - 31
Signal Processing - November 2016 - 32
Signal Processing - November 2016 - 33
Signal Processing - November 2016 - 34
Signal Processing - November 2016 - 35
Signal Processing - November 2016 - 36
Signal Processing - November 2016 - 37
Signal Processing - November 2016 - 38
Signal Processing - November 2016 - 39
Signal Processing - November 2016 - 40
Signal Processing - November 2016 - 41
Signal Processing - November 2016 - 42
Signal Processing - November 2016 - 43
Signal Processing - November 2016 - 44
Signal Processing - November 2016 - 45
Signal Processing - November 2016 - 46
Signal Processing - November 2016 - 47
Signal Processing - November 2016 - 48
Signal Processing - November 2016 - 49
Signal Processing - November 2016 - 50
Signal Processing - November 2016 - 51
Signal Processing - November 2016 - 52
Signal Processing - November 2016 - 53
Signal Processing - November 2016 - 54
Signal Processing - November 2016 - 55
Signal Processing - November 2016 - 56
Signal Processing - November 2016 - 57
Signal Processing - November 2016 - 58
Signal Processing - November 2016 - 59
Signal Processing - November 2016 - 60
Signal Processing - November 2016 - 61
Signal Processing - November 2016 - 62
Signal Processing - November 2016 - 63
Signal Processing - November 2016 - 64
Signal Processing - November 2016 - 65
Signal Processing - November 2016 - 66
Signal Processing - November 2016 - 67
Signal Processing - November 2016 - 68
Signal Processing - November 2016 - 69
Signal Processing - November 2016 - 70
Signal Processing - November 2016 - 71
Signal Processing - November 2016 - 72
Signal Processing - November 2016 - 73
Signal Processing - November 2016 - 74
Signal Processing - November 2016 - 75
Signal Processing - November 2016 - 76
Signal Processing - November 2016 - 77
Signal Processing - November 2016 - 78
Signal Processing - November 2016 - 79
Signal Processing - November 2016 - 80
Signal Processing - November 2016 - 81
Signal Processing - November 2016 - 82
Signal Processing - November 2016 - 83
Signal Processing - November 2016 - 84
Signal Processing - November 2016 - 85
Signal Processing - November 2016 - 86
Signal Processing - November 2016 - 87
Signal Processing - November 2016 - 88
Signal Processing - November 2016 - 89
Signal Processing - November 2016 - 90
Signal Processing - November 2016 - 91
Signal Processing - November 2016 - 92
Signal Processing - November 2016 - 93
Signal Processing - November 2016 - 94
Signal Processing - November 2016 - 95
Signal Processing - November 2016 - 96
Signal Processing - November 2016 - 97
Signal Processing - November 2016 - 98
Signal Processing - November 2016 - 99
Signal Processing - November 2016 - 100
Signal Processing - November 2016 - 101
Signal Processing - November 2016 - 102
Signal Processing - November 2016 - 103
Signal Processing - November 2016 - 104
Signal Processing - November 2016 - 105
Signal Processing - November 2016 - 106
Signal Processing - November 2016 - 107
Signal Processing - November 2016 - 108
Signal Processing - November 2016 - 109
Signal Processing - November 2016 - 110
Signal Processing - November 2016 - 111
Signal Processing - November 2016 - 112
Signal Processing - November 2016 - 113
Signal Processing - November 2016 - 114
Signal Processing - November 2016 - 115
Signal Processing - November 2016 - 116
Signal Processing - November 2016 - 117
Signal Processing - November 2016 - 118
Signal Processing - November 2016 - 119
Signal Processing - November 2016 - 120
Signal Processing - November 2016 - 121
Signal Processing - November 2016 - 122
Signal Processing - November 2016 - 123
Signal Processing - November 2016 - 124
Signal Processing - November 2016 - 125
Signal Processing - November 2016 - 126
Signal Processing - November 2016 - 127
Signal Processing - November 2016 - 128
Signal Processing - November 2016 - 129
Signal Processing - November 2016 - 130
Signal Processing - November 2016 - 131
Signal Processing - November 2016 - 132
Signal Processing - November 2016 - 133
Signal Processing - November 2016 - 134
Signal Processing - November 2016 - 135
Signal Processing - November 2016 - 136
Signal Processing - November 2016 - 137
Signal Processing - November 2016 - 138
Signal Processing - November 2016 - 139
Signal Processing - November 2016 - 140
Signal Processing - November 2016 - 141
Signal Processing - November 2016 - 142
Signal Processing - November 2016 - 143
Signal Processing - November 2016 - 144
Signal Processing - November 2016 - 145
Signal Processing - November 2016 - 146
Signal Processing - November 2016 - 147
Signal Processing - November 2016 - 148
Signal Processing - November 2016 - Cover3
Signal Processing - November 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com