IEEE Signal Processing - July 2018 - 73

this model has had is based on the theoretical foundations that
accompany its construction, providing a solid support for many
of the developed algorithms arising in this field. Indeed, in the
broad scientific arena of data processing, Sparseland is quite
unique due to the synergy that exists between theory, algorithms,
and applications.
This article embarks from the story of Sparseland to discuss
two recent special cases of it-CSC and ML-CSC. As we shall
see, these descendant models pave a clear and surprising highway
between sparse modeling and deep-learning architectures. This
article's main goal is to present a novel theory for explaining deep
(convolutional) neural networks and their origins, all through the
language of sparse representations.
Clearly, our work is not the only one nor the first to theoretically explain deep learning. Indeed, various such attempts have
already appeared in the literature (e.g., [2]-[13]). Broadly speaking, this knowledge stands on three pillars-the architectures
used, the algorithms and optimization aspects involved, and the
data these all serve. A comprehensive theory should cover all
three, but this seems to be a tough challenge. The existing works
typically cover one or two of these pillars, such as in the following examples:
■ Architecture: The work by Giryes et. al. [9], showed that the
used architectures tend to preserve distances, and explained
the relevance of this property for classification. The work
reported in [7] and [8] analyzed the capacity of these architectures to cover a specific family of functions.
■ Algorithms: Vidal's work [5] explained why local minima are
not to be feared, as they may coincide with the objective's
global minimum. Chaudhari and Soatto proved [6] that the
stochastic gradient descent algorithm induces an implicit regularization that is related to the information-bottleneck objective [3].
■ Data: Bruna and Mallat [4] motivated the architectures by the
data invariances that should be taken care of. Baraniuk's team
[2] developed a probabilistic generative model for the data
that, in turn, justifies the architectures used.
In this context we should note that our work covers the data by
modeling it, and the architectures as emerging from the model.
Our work is close in spirit to [2] but also markedly different.
A central idea that will accompany us in this article refers to
the fact that Sparseland and its aforementioned descendants are
all generative models. By this we mean that they offer a description of the signals of interest by proposing a synthesis procedure
for their creation. We argue that such generative models facilitate
a systematic pathway for algorithm design, while also enabling
a theoretical analysis of their performance. Indeed, we will see
throughout this article how these two benefits go hand in hand.
By relying on the generative model, we will analyze certain feedforward convolutional neural networks (CNNs), identify key theoretical weaknesses in them, and then tackle these by proposing
new architectures. Surprisingly, this journey will lead us to some
of the well-known feed-forward CNNs used today.
Standing at the horizon of this work is our desire to present
what is called the ML convolutional sparse modeling idea, as
it will be the grounds on which we derive all the previously

mentioned claimed results. Thus, we take this as our running
title and build the article, section by section, focusing each time
on another word in it. We start by explaining better what we
mean by the term modeling, and then move to describe sparse
modeling, essentially conveying the story of Sparseland. Then
we shift to convolutional sparse modeling, presenting this
model along with a recent and novel analysis of it that relies on
local sparsity. We conclude by presenting ML convolutional
sparse modeling, tying this to the realm of deep learning, just
as promised.
Before we start our journey, a few comments are in order.
1) Quoting Ron Kimmel (Computer Science Department at the
Technion-Israel), this grand task of attaching a theory to deep
learning behaves like a magic mirror, in which every researcher sees himself. This explains the so-diverse explanations that
have been accumulated, relying on information theory [3],
passing through wavelets and invariances [4], proposing a
sparse modeling point of view [1], and going all the way to
partial differential equations [10]. Indeed, it was David
Donoho (Department of Statistics at Stanford University) who
strengthened this vivid description by mentioning that this
magical mirror is taken straight from Cinderella's story, as it is
not just showing to each researcher his/her reflection, but also
accompanies this with warm compliments, assuring that their
view is truly the best.
2) This article focuses mostly on the theoretical sides of the
models we shall discuss, but without delving into the proofs
for the theorems we will state. This implies two things: 1) less
emphasis will be put on applications and experiments; and 2)
the content of this article is somewhat involved due to the delicate theoretical statements brought, so be patient.
3) Echoed in this article is a keynote talk given by Michael Elad
during the 2017 IEEE International Conference on Image
Processing in Beijing, as we follow closely this lecture, both
in style and content. The recent part of the results presented
(on CSC and ML-CSC) can be found in [1] and [14], but the
description of the path from Sparseland to deep learning as
posed here differs substantially.

Modeling
Our data is structured
Engineers and researchers rarely stop to wonder about our core
ability to process signals-we simply take it for granted. Why is
it that we can denoise signals? Why can we compress them, or
recover them from various degradations, or find anomalies in
them, or even recognize their content? The algorithms that tackle these and many other tasks-including separation, segmentation, identification, interpolation, extrapolation, clustering,
prediction, synthesis, and many more-all rely on one fundamental property that only meaningful data sources obey: they
are all structured.
Each source of information we encounter in our everyday
lives exhibits an inner structure that is unique to it, and which
can be characterized in various ways. We may allude to redundancy in the data, assume it satisfies a self-similarity property,

IEEE Signal Processing Magazine

|

July 2018

|

73



Table of Contents for the Digital Edition of IEEE Signal Processing - July 2018

Contents
IEEE Signal Processing - July 2018 - Cover1
IEEE Signal Processing - July 2018 - Cover2
IEEE Signal Processing - July 2018 - Contents
IEEE Signal Processing - July 2018 - 2
IEEE Signal Processing - July 2018 - 3
IEEE Signal Processing - July 2018 - 4
IEEE Signal Processing - July 2018 - 5
IEEE Signal Processing - July 2018 - 6
IEEE Signal Processing - July 2018 - 7
IEEE Signal Processing - July 2018 - 8
IEEE Signal Processing - July 2018 - 9
IEEE Signal Processing - July 2018 - 10
IEEE Signal Processing - July 2018 - 11
IEEE Signal Processing - July 2018 - 12
IEEE Signal Processing - July 2018 - 13
IEEE Signal Processing - July 2018 - 14
IEEE Signal Processing - July 2018 - 15
IEEE Signal Processing - July 2018 - 16
IEEE Signal Processing - July 2018 - 17
IEEE Signal Processing - July 2018 - 18
IEEE Signal Processing - July 2018 - 19
IEEE Signal Processing - July 2018 - 20
IEEE Signal Processing - July 2018 - 21
IEEE Signal Processing - July 2018 - 22
IEEE Signal Processing - July 2018 - 23
IEEE Signal Processing - July 2018 - 24
IEEE Signal Processing - July 2018 - 25
IEEE Signal Processing - July 2018 - 26
IEEE Signal Processing - July 2018 - 27
IEEE Signal Processing - July 2018 - 28
IEEE Signal Processing - July 2018 - 29
IEEE Signal Processing - July 2018 - 30
IEEE Signal Processing - July 2018 - 31
IEEE Signal Processing - July 2018 - 32
IEEE Signal Processing - July 2018 - 33
IEEE Signal Processing - July 2018 - 34
IEEE Signal Processing - July 2018 - 35
IEEE Signal Processing - July 2018 - 36
IEEE Signal Processing - July 2018 - 37
IEEE Signal Processing - July 2018 - 38
IEEE Signal Processing - July 2018 - 39
IEEE Signal Processing - July 2018 - 40
IEEE Signal Processing - July 2018 - 41
IEEE Signal Processing - July 2018 - 42
IEEE Signal Processing - July 2018 - 43
IEEE Signal Processing - July 2018 - 44
IEEE Signal Processing - July 2018 - 45
IEEE Signal Processing - July 2018 - 46
IEEE Signal Processing - July 2018 - 47
IEEE Signal Processing - July 2018 - 48
IEEE Signal Processing - July 2018 - 49
IEEE Signal Processing - July 2018 - 50
IEEE Signal Processing - July 2018 - 51
IEEE Signal Processing - July 2018 - 52
IEEE Signal Processing - July 2018 - 53
IEEE Signal Processing - July 2018 - 54
IEEE Signal Processing - July 2018 - 55
IEEE Signal Processing - July 2018 - 56
IEEE Signal Processing - July 2018 - 57
IEEE Signal Processing - July 2018 - 58
IEEE Signal Processing - July 2018 - 59
IEEE Signal Processing - July 2018 - 60
IEEE Signal Processing - July 2018 - 61
IEEE Signal Processing - July 2018 - 62
IEEE Signal Processing - July 2018 - 63
IEEE Signal Processing - July 2018 - 64
IEEE Signal Processing - July 2018 - 65
IEEE Signal Processing - July 2018 - 66
IEEE Signal Processing - July 2018 - 67
IEEE Signal Processing - July 2018 - 68
IEEE Signal Processing - July 2018 - 69
IEEE Signal Processing - July 2018 - 70
IEEE Signal Processing - July 2018 - 71
IEEE Signal Processing - July 2018 - 72
IEEE Signal Processing - July 2018 - 73
IEEE Signal Processing - July 2018 - 74
IEEE Signal Processing - July 2018 - 75
IEEE Signal Processing - July 2018 - 76
IEEE Signal Processing - July 2018 - 77
IEEE Signal Processing - July 2018 - 78
IEEE Signal Processing - July 2018 - 79
IEEE Signal Processing - July 2018 - 80
IEEE Signal Processing - July 2018 - 81
IEEE Signal Processing - July 2018 - 82
IEEE Signal Processing - July 2018 - 83
IEEE Signal Processing - July 2018 - 84
IEEE Signal Processing - July 2018 - 85
IEEE Signal Processing - July 2018 - 86
IEEE Signal Processing - July 2018 - 87
IEEE Signal Processing - July 2018 - 88
IEEE Signal Processing - July 2018 - 89
IEEE Signal Processing - July 2018 - 90
IEEE Signal Processing - July 2018 - 91
IEEE Signal Processing - July 2018 - 92
IEEE Signal Processing - July 2018 - 93
IEEE Signal Processing - July 2018 - 94
IEEE Signal Processing - July 2018 - 95
IEEE Signal Processing - July 2018 - 96
IEEE Signal Processing - July 2018 - 97
IEEE Signal Processing - July 2018 - 98
IEEE Signal Processing - July 2018 - 99
IEEE Signal Processing - July 2018 - 100
IEEE Signal Processing - July 2018 - 101
IEEE Signal Processing - July 2018 - 102
IEEE Signal Processing - July 2018 - 103
IEEE Signal Processing - July 2018 - 104
IEEE Signal Processing - July 2018 - 105
IEEE Signal Processing - July 2018 - 106
IEEE Signal Processing - July 2018 - 107
IEEE Signal Processing - July 2018 - 108
IEEE Signal Processing - July 2018 - 109
IEEE Signal Processing - July 2018 - 110
IEEE Signal Processing - July 2018 - 111
IEEE Signal Processing - July 2018 - 112
IEEE Signal Processing - July 2018 - 113
IEEE Signal Processing - July 2018 - 114
IEEE Signal Processing - July 2018 - 115
IEEE Signal Processing - July 2018 - 116
IEEE Signal Processing - July 2018 - 117
IEEE Signal Processing - July 2018 - 118
IEEE Signal Processing - July 2018 - 119
IEEE Signal Processing - July 2018 - 120
IEEE Signal Processing - July 2018 - 121
IEEE Signal Processing - July 2018 - 122
IEEE Signal Processing - July 2018 - 123
IEEE Signal Processing - July 2018 - 124
IEEE Signal Processing - July 2018 - 125
IEEE Signal Processing - July 2018 - 126
IEEE Signal Processing - July 2018 - 127
IEEE Signal Processing - July 2018 - 128
IEEE Signal Processing - July 2018 - 129
IEEE Signal Processing - July 2018 - 130
IEEE Signal Processing - July 2018 - 131
IEEE Signal Processing - July 2018 - 132
IEEE Signal Processing - July 2018 - 133
IEEE Signal Processing - July 2018 - 134
IEEE Signal Processing - July 2018 - 135
IEEE Signal Processing - July 2018 - 136
IEEE Signal Processing - July 2018 - 137
IEEE Signal Processing - July 2018 - 138
IEEE Signal Processing - July 2018 - 139
IEEE Signal Processing - July 2018 - 140
IEEE Signal Processing - July 2018 - Cover3
IEEE Signal Processing - July 2018 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com