IEEE Circuits and Systems Magazine - Q4 2019 - 20

and complex information contained in acoustic signals is
useful for spatial location, determining the species of
a sound source, identity of a speaker and the message
being conveyed [2]. Ever since the advantages of digital
signal processing and machine learning technology were
consolidated into acoustic signal processing systems, the
effectiveness of algorithms that can describe, categorize
and interpret all manner of sounds has been improved
significantly. Recently, the focus of research in machine
learning algorithms has moved onto more practical and
emerging fields like speech/speaker recognition, speech
synthesis and acoustic event/scene recognition. In this
paper, a thorough survey of state-of-the-art deep learning algorithms applied to Speech Synthesis (SS) and Voice
Conversion (VC) will be provided. Other types of acoustic
signals like music and singing will also be included. However the analysis can be used for reference to any form of
audio generating system or application.
The task of speech synthesis is to build naturalsounding synthetic voices with either knowledge-based
or data-based approaches [3]. A typical text-to-speech
(TTS) system is shown in Fig. 1. In the front-end, the raw
texts are firstly converted into the equivalent of writtenout words for the task of text normalization. Then the
phonetic transcriptions are assigned to each word and
the texts are marked into prosodic units. This step is
usually called text-to-phoneme conversion. The output
of the front-end is the symbolic linguistic representation. The back-end of the TTS system is referred to as
the synthesizer. The task of the synthesizer is to convert
the symbolic linguistic representation into sound.
Conventional speech synthesis technology are diphone synthesis, concatenative speech synthesis and
statistical parametric speech synthesis (SPSS). Unit
selection based concatenative techniques have been
the main approaches to speech synthesis, of which the
quality of the generated speech depends on the avail-

Text

Text
Analysis

Front-End

Back-End

Linguistic
Analysis

Waveform
Generation

Figure 1. Overview of a typical TTS system.

Speech

able corpora. Suitable sub-word units are automatically
chosen from selected corpora of natural speech [4]. Unit
selection based toolkits, like the open-source multilingual TTS synthesis platform MaryTTS [5], have been
embedded in commercial applications and can bring
synthetic speech with a high level of quality. In contrast to retain unmodified speech components in unit
selection based methods, statistical parametric speech
synthesis systems use parametric models to generate
universal descriptions of speech subsets with similar
sounding segments. Specifically, the speech are described by models using parameters instead of stored
exemplars. These parameters are represented by statistics such as means and variances of probability density
functions found in the training data. One of the most
popular statistical parametric synthesis techniques
is the Hidden Markov Model (HMM) synthesis, which
allows more variations on the speech data by statistically modeling and generating the speech parameters
of a speech unit with the maximum likelihood criterion
based HMMs [6]. An annual challenge named the Blizzard Challenge1 is held to compare research techniques
in corpus-based speech synthesizers, which advances
the better understanding and exploration of effective
speech synthesis methods.
Compared to the speech synthesis, the voice conversion is a different technique to create high-quality synthetic speech. The task of voice conversion is to convert
a sentence spoken by an original speaker to a resultant
utterance with the same sentence as before. The resultant utterance sounds as being spoken from a different
speaker. The information of the timbre and the prosody
of the original and target speakers are considered in a
voice conversion system. These two features are generally associated with the dynamic spectral envelope of
the voice signal, pitch/energy contours and rhythmic
distribution of phonemes [7]. A diagram of the typical
voice conversion system is given in Fig. 2. In the training
phase, the speech signal from the source speakers and
the target speakers are fed into the VC system. After the
speech analysis and the mapping feature computation,
the raw speech signals are converted into a suitable representation for the further processing and modification.
The speech segments of the training signals are aligned
with respect to time. The conversion function is then
trained on these aligned mapping features. In the conversion phase, the mapping features of a new input utterance are also extracted and then converted by the
1

http://www.festvox.org/blizzard/blizzard2017.html.

Y. Zhao, X. Xia and R. Togneri are with the School of Electrical, Electronics and Computer Engineering, The University of Western Australia, Perth, WA
6009, Australia (e-mail: yuanjun.zhao@research.uwa.edu.au; xianjun.xia@research.uwa.edu.au; roberto.togneri@uwa.edu.au).

20

IEEE CIRCUITS AND SYSTEMS MAGAZINE

FOURTH QUARTER 2019



IEEE Circuits and Systems Magazine - Q4 2019

Table of Contents for the Digital Edition of IEEE Circuits and Systems Magazine - Q4 2019

Contents
IEEE Circuits and Systems Magazine - Q4 2019 - Cover1
IEEE Circuits and Systems Magazine - Q4 2019 - Cover2
IEEE Circuits and Systems Magazine - Q4 2019 - 1
IEEE Circuits and Systems Magazine - Q4 2019 - Contents
IEEE Circuits and Systems Magazine - Q4 2019 - 3
IEEE Circuits and Systems Magazine - Q4 2019 - 4
IEEE Circuits and Systems Magazine - Q4 2019 - 5
IEEE Circuits and Systems Magazine - Q4 2019 - 6
IEEE Circuits and Systems Magazine - Q4 2019 - 7
IEEE Circuits and Systems Magazine - Q4 2019 - 8
IEEE Circuits and Systems Magazine - Q4 2019 - 9
IEEE Circuits and Systems Magazine - Q4 2019 - 10
IEEE Circuits and Systems Magazine - Q4 2019 - 11
IEEE Circuits and Systems Magazine - Q4 2019 - 12
IEEE Circuits and Systems Magazine - Q4 2019 - 13
IEEE Circuits and Systems Magazine - Q4 2019 - 14
IEEE Circuits and Systems Magazine - Q4 2019 - 15
IEEE Circuits and Systems Magazine - Q4 2019 - 16
IEEE Circuits and Systems Magazine - Q4 2019 - 17
IEEE Circuits and Systems Magazine - Q4 2019 - 18
IEEE Circuits and Systems Magazine - Q4 2019 - 19
IEEE Circuits and Systems Magazine - Q4 2019 - 20
IEEE Circuits and Systems Magazine - Q4 2019 - 21
IEEE Circuits and Systems Magazine - Q4 2019 - 22
IEEE Circuits and Systems Magazine - Q4 2019 - 23
IEEE Circuits and Systems Magazine - Q4 2019 - 24
IEEE Circuits and Systems Magazine - Q4 2019 - 25
IEEE Circuits and Systems Magazine - Q4 2019 - 26
IEEE Circuits and Systems Magazine - Q4 2019 - 27
IEEE Circuits and Systems Magazine - Q4 2019 - 28
IEEE Circuits and Systems Magazine - Q4 2019 - 29
IEEE Circuits and Systems Magazine - Q4 2019 - 30
IEEE Circuits and Systems Magazine - Q4 2019 - 31
IEEE Circuits and Systems Magazine - Q4 2019 - 32
IEEE Circuits and Systems Magazine - Q4 2019 - 33
IEEE Circuits and Systems Magazine - Q4 2019 - 34
IEEE Circuits and Systems Magazine - Q4 2019 - 35
IEEE Circuits and Systems Magazine - Q4 2019 - 36
IEEE Circuits and Systems Magazine - Q4 2019 - 37
IEEE Circuits and Systems Magazine - Q4 2019 - 38
IEEE Circuits and Systems Magazine - Q4 2019 - 39
IEEE Circuits and Systems Magazine - Q4 2019 - 40
IEEE Circuits and Systems Magazine - Q4 2019 - 41
IEEE Circuits and Systems Magazine - Q4 2019 - 42
IEEE Circuits and Systems Magazine - Q4 2019 - 43
IEEE Circuits and Systems Magazine - Q4 2019 - 44
IEEE Circuits and Systems Magazine - Q4 2019 - 45
IEEE Circuits and Systems Magazine - Q4 2019 - 46
IEEE Circuits and Systems Magazine - Q4 2019 - 47
IEEE Circuits and Systems Magazine - Q4 2019 - 48
IEEE Circuits and Systems Magazine - Q4 2019 - 49
IEEE Circuits and Systems Magazine - Q4 2019 - 50
IEEE Circuits and Systems Magazine - Q4 2019 - 51
IEEE Circuits and Systems Magazine - Q4 2019 - 52
IEEE Circuits and Systems Magazine - Q4 2019 - 53
IEEE Circuits and Systems Magazine - Q4 2019 - 54
IEEE Circuits and Systems Magazine - Q4 2019 - 55
IEEE Circuits and Systems Magazine - Q4 2019 - 56
IEEE Circuits and Systems Magazine - Q4 2019 - 57
IEEE Circuits and Systems Magazine - Q4 2019 - 58
IEEE Circuits and Systems Magazine - Q4 2019 - 59
IEEE Circuits and Systems Magazine - Q4 2019 - 60
IEEE Circuits and Systems Magazine - Q4 2019 - 61
IEEE Circuits and Systems Magazine - Q4 2019 - 62
IEEE Circuits and Systems Magazine - Q4 2019 - 63
IEEE Circuits and Systems Magazine - Q4 2019 - 64
IEEE Circuits and Systems Magazine - Q4 2019 - 65
IEEE Circuits and Systems Magazine - Q4 2019 - 66
IEEE Circuits and Systems Magazine - Q4 2019 - 67
IEEE Circuits and Systems Magazine - Q4 2019 - 68
IEEE Circuits and Systems Magazine - Q4 2019 - Cover3
IEEE Circuits and Systems Magazine - Q4 2019 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2023Q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2023Q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2023Q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2022Q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2022Q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2022Q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2022Q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2021Q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2021q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2021q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2021q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2020q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2020q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2020q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2020q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2019q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2019q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2019q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2019q1
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2018q4
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2018q3
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2018q2
https://www.nxtbook.com/nxtbooks/ieee/circuitsandsystems_2018q1
https://www.nxtbookmedia.com