Signal Processing - November 2017 - 37

Washington University, Washington, D.C., in 2010. He is a
Ph.D. degree candidate in the Human and Social Dimensions
of Science and Technology Department at Arizona State
University and a research fellow at the University of Oxford's
Future of Humanity Institute. His research focuses on governance issues related to artificial intelligence.
Anil Anthony Bharath (a.bharath@ic.ac.uk) received his B.
Eng. degree in electronic and electrical engineering from
University College London in 1988 and his Ph.D. degree in signal processing from Imperial College London in 1993, where he
is currently a reader in the Department of Bioengineering. He is
also a fellow of the Institution of Engineering and Technology
and a cofounder of Cortexica Vision Systems. He was previously
an academic visitor in the Signal Processing Group at the
University of Cambridge in 2006. His research interests are in
deep architectures for visual inference.

References

[21] N. Heess, J. J. Hunt, T. P. Lillicrap, and D. Silver. "Memory-based control with
recurrent neural networks," in NIPS Workshop on Deep Reinforcement Learning,
2015.
[22] N. Heess, G. Wayne, D. Silver, T. Lillicrap, T. Erez, and Y. Tassa, "Learning
continuous control policies by stochastic value gradients," in Proc. Neural
Information Processing Systems, 2015, pp. 2944-2952.
[23] T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, A.
Sendonaris, G. Dulac-Arnold, et al. (2017). Learning from demonstrations for real
world reinforcement learning. arXiv. [Online]. Available: https://arxiv.org/
abs/1704.03732
[24] J. Ho and S. Ermon, "Generative adversarial imitation learning," in Proc. Neural
Information Processing Systems, 2016, pp. 4565-4573.
[25] R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. de Turck, and P. Abbeel,
"VIME: Variational information maximizing exploration," in Proc. Neural Information
Processing Systems, 2016, pp. 1109-1117.
[26] M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K.
Kavukcuoglu, "Reinforcement learning with unsupervised auxiliary tasks," in Proc.
Int. Conf. Learning Representations, 2017.
[27] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, "Planning and acting in partially observable stochastic domains," Artificial Intell., vol. 101, no. 1, pp. 99-134,
1998.
[28] S. M. Kakade, "A natural policy gradient," in Proc. Neural Information
Processing Systems, 2002, pp. 1531-1538.

[1] K. Arulkumaran, N. Dilokthanakul, M. Shanahan, and A. A. Bharath,
"Classifying options for deep reinforcement learning," in Proc. IJCAI Workshop Deep
Reinforcement Learning: Frontiers and Challenges, 2016.

[29] K. Kansky, T. Silver, D. A. Mély, M. Eldawy, M. Lázaro-Gredilla, X. Lou, N.
Dorfman, S. Sidor, S. Phoenix, and D. George, "Schema networks: zero-shot transfer
with a generative causal model of intuitive physics," in Proc. Int. Conf. Machine
Learning, 2017, pp. 1809-1818.

[2] P. Bacon, J. Harb, and D. Precup, "The option-critic architecture," in Proc.
Association Advancement Artificial Intelligence, 2017, pp. 1726-1734.

[30] D. P. Kingma and M. Welling, "Auto-encoding variational bayes," in Proc. Int.
Conf. Learning Representations, 2014.

[3] L. C. Baird III, "Advantage updating," Defense Tech. Inform. Center, Tech. Report
D-A280 862, Fort Belvoir, VA, 1993.

[31] N. Kohl and P. Stone, "Policy gradient reinforcement learning for fast quadrupedal
locomotion," in Proc. IEEE Int. Conf. Robotics and Automation, 2004, pp. 2619-
2624.

[4] M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos,
"Unifying count-based exploration and intrinsic motivation," in Proc. Neural
Information Processing Systems, 2016, pp. 1471-1479.

[32] V. R. Konda and J. N. Tsitsiklis, "On actor-critic algorithms," SIAM J. Control
Optim., vol. 42, no. 4, pp. 1143-1166, 2003.

[5] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, "The arcade learning
environment: an evaluation platform for general agents," in Proc. Int. Joint Conf.
Artificial Intelligence, 2015, pp. 253-279.

[33] J. Koutník, G. Cuccu, J. Schmidhuber, and F. Gomez, "Evolving large-scale neural networks for vision-based reinforcement learning," in Proc. Conf. Genetic and
Evolutionary Computation, 2013, pp. 1061-1068.

[6] R. Bellman, "On the theory of dynamic programming," Proc. Nat. Acad. Sci., vol.
38, no. 8, pp. 716-719, 1952.

[34] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, "Hierarchical deep
reinforcement learning: Integrating temporal abstraction and intrinsic motivation," in
Proc. Neural Information Processing Systems, 2016, pp. 3675-3683.

[7] Y. Bengio, A. Courville, and P. Vincent, "Representation learning: a review and new
perspectives," IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798-1828, 2013.
[8] L. Busoniu, R. Babuska, and B. De Schutter, "A comprehensive survey of multiagent
reinforcement learning," IEEE Trans. Syst., Man, Cybern., vol. 38, no. 2, pp. 156-172, 2008.
[9] M. Campbell, A. J. Hoane, and F. Hsu, "Deep Blue," Artificial Intell., vol. 134, no.
1-2, pp. 57-83, 2002.
[10] S. Chiappa, S. Racaniere, D. Wierstra, and S. Mohamed, "Recurrent environment
simulators," in Proc. Int. Conf. Learning Representations, 2017.
[11] P. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P.
Abbeel, and W. Zaremba. (2016). Transfer from simulation to real world through learning deep inverse dynamics model. arXiv. [Online]. Available: https://arxiv.org/
abs/1610.03518
[12] M. P. Deisenroth, G. Neumann, and J. Peters, "A survey on policy search for
robotics," Foundations and Trends in Robotics, vol. 2, no. 1-2, pp. 1-142, 2013.
[13] M. Denil, P. Agrawal, T. D. Kulkarni, T. Erez, P. Battaglia, and N. de Freitas,
"Learning to perform physics experiments via deep reinforcement learning," in Proc.
Int. Conf. Learning Representations, 2017.
[14] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, "Deep spatial
autoencoders for visuomotor learning," in Proc. IEEE Int. Conf. Robotics and
Automation, 2016, pp. 512-519.
[15] J. Foerster, Y. M. Assael, N. de Freitas, and S. Whiteson, "Learning to communicate with deep multi-agent reinforcement learning," in Proc. Neural Information
Processing Systems, 2016, pp. 2137-2145.
[16] M. Garnelo, K. Arulkumaran, and M. Shanahan, "Towards deep symbolic reinforcement learning," in NIPS Workshop on Deep Reinforcement Learning, 2016.
[17] F. Gomez and J. Schmidhuber. "Evolving modular fast-weight networks for control," in Proc. Int. Conf. Artificial Neural Networks, 2005, pp. 383-389.
[18] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.
Courville, and Y. Bengio, "Generative adversarial nets," in Proc. Neural Information
Processing Systems, 2014, pp. 2672-2680.

[35] T. D. Kulkarni, A. Saeedi, S. Gautam, and S. J. Gershman, "Deep successor reinforcement learning," in NIPS Workshop on Deep Reinforcement Learning, 2016.
[36] T. L. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Adv.
Appl. Math., vol. 6, no. 1, pp. 4-22, 1985.
[37] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, "Building
machines that learn and think like people," Behavioral Brain Sci., pp. 1-101, 2016.
[Online]. Available: https://www.cambridge.org/core/journals/behavioral-and-brainsciences/a r ticle/ build ing-mach ines-t hat-lea r n-a nd-t h in k-l i ke-people/
A9535B1D745A0377E16C590E14B94993
[38] S. Lange, M. Riedmiller, and A. Voigtlander, "Autonomous reinforcement learning on raw visual input data in a real world application," in Proc. Int. Joint Conf.
Neural Networks, 2012, pp. 1-8.
[39] Y. LeCun, Y. Bengio, and G. Hinton. "Deep learning," Nature, vol. 521, no. 7553,
pp. 436-444, 2015.
[40] S. Levine and V. Koltun, "Guided policy search," in Proc. Int. Conf. Learning
Representations, 2013.
[41] S. Levine, C. Finn, T. Darrell, and P. Abbeel, "End-to-end training of deep visuomotor policies," J. Mach. Learning Res., vol. 17, no. 39, pp. 1-40, 2016.
[42] S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection," in Proc.
Int. Symp. Experimental Robotics, 2016, pp. 173-184.
[43] Y. Li. (2017). Deep reinforcement learning: An overview. arXiv. [Online].
Available: https://arxiv.org/abs/1701.07274
[44] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D.
Wierstra, "Continuous control with deep reinforcement learning," in Proc. Int. Conf.
Learning Representations, 2016.
[45] L. Lin, "Self-improving reactive agents based on reinforcement learning, planning
and teaching," Mach. Learning, vol. 8, no. 3-4, pp. 293-321, 1992.

[19] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, "Continuous deep Q-learning with
model-based acceleration," in Proc. Int. Conf. Learning Representations, 2016.

[46] P. Mirowski, R. Pascanu, F. Viola, H. Soyer, A. Ballard, A. Banino, M. Denil, R.
Goroshin, et al., "Learning to navigate in complex environments," in Proc. Int. Conf.
Learning Representations, 2017.

[20] M. Hausknecht and P. Stone, "Deep recurrent Q-learning for partially observable MDPs," in Association for the Advancement of Artificial Intelligence Fall
Symp. Series, 2015.

[47] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare,
A. Graves, M. Riedmiller, et al., "Human-level control through deep reinforcement
learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015.

IEEE SIGNAL PROCESSING MAGAZINE

|

November 2017

|

37


https://www.arxiv.org/ https://www.cambridge.org/core/journals/behavioral-and-brain https://www.arxiv.org/ https://www.arxiv.org/abs/1701.07274

Table of Contents for the Digital Edition of Signal Processing - November 2017

Signal Processing - November 2017 - Cover1
Signal Processing - November 2017 - Cover2
Signal Processing - November 2017 - 1
Signal Processing - November 2017 - 2
Signal Processing - November 2017 - 3
Signal Processing - November 2017 - 4
Signal Processing - November 2017 - 5
Signal Processing - November 2017 - 6
Signal Processing - November 2017 - 7
Signal Processing - November 2017 - 8
Signal Processing - November 2017 - 9
Signal Processing - November 2017 - 10
Signal Processing - November 2017 - 11
Signal Processing - November 2017 - 12
Signal Processing - November 2017 - 13
Signal Processing - November 2017 - 14
Signal Processing - November 2017 - 15
Signal Processing - November 2017 - 16
Signal Processing - November 2017 - 17
Signal Processing - November 2017 - 18
Signal Processing - November 2017 - 19
Signal Processing - November 2017 - 20
Signal Processing - November 2017 - 21
Signal Processing - November 2017 - 22
Signal Processing - November 2017 - 23
Signal Processing - November 2017 - 24
Signal Processing - November 2017 - 25
Signal Processing - November 2017 - 26
Signal Processing - November 2017 - 27
Signal Processing - November 2017 - 28
Signal Processing - November 2017 - 29
Signal Processing - November 2017 - 30
Signal Processing - November 2017 - 31
Signal Processing - November 2017 - 32
Signal Processing - November 2017 - 33
Signal Processing - November 2017 - 34
Signal Processing - November 2017 - 35
Signal Processing - November 2017 - 36
Signal Processing - November 2017 - 37
Signal Processing - November 2017 - 38
Signal Processing - November 2017 - 39
Signal Processing - November 2017 - 40
Signal Processing - November 2017 - 41
Signal Processing - November 2017 - 42
Signal Processing - November 2017 - 43
Signal Processing - November 2017 - 44
Signal Processing - November 2017 - 45
Signal Processing - November 2017 - 46
Signal Processing - November 2017 - 47
Signal Processing - November 2017 - 48
Signal Processing - November 2017 - 49
Signal Processing - November 2017 - 50
Signal Processing - November 2017 - 51
Signal Processing - November 2017 - 52
Signal Processing - November 2017 - 53
Signal Processing - November 2017 - 54
Signal Processing - November 2017 - 55
Signal Processing - November 2017 - 56
Signal Processing - November 2017 - 57
Signal Processing - November 2017 - 58
Signal Processing - November 2017 - 59
Signal Processing - November 2017 - 60
Signal Processing - November 2017 - 61
Signal Processing - November 2017 - 62
Signal Processing - November 2017 - 63
Signal Processing - November 2017 - 64
Signal Processing - November 2017 - 65
Signal Processing - November 2017 - 66
Signal Processing - November 2017 - 67
Signal Processing - November 2017 - 68
Signal Processing - November 2017 - 69
Signal Processing - November 2017 - 70
Signal Processing - November 2017 - 71
Signal Processing - November 2017 - 72
Signal Processing - November 2017 - 73
Signal Processing - November 2017 - 74
Signal Processing - November 2017 - 75
Signal Processing - November 2017 - 76
Signal Processing - November 2017 - 77
Signal Processing - November 2017 - 78
Signal Processing - November 2017 - 79
Signal Processing - November 2017 - 80
Signal Processing - November 2017 - 81
Signal Processing - November 2017 - 82
Signal Processing - November 2017 - 83
Signal Processing - November 2017 - 84
Signal Processing - November 2017 - 85
Signal Processing - November 2017 - 86
Signal Processing - November 2017 - 87
Signal Processing - November 2017 - 88
Signal Processing - November 2017 - 89
Signal Processing - November 2017 - 90
Signal Processing - November 2017 - 91
Signal Processing - November 2017 - 92
Signal Processing - November 2017 - 93
Signal Processing - November 2017 - 94
Signal Processing - November 2017 - 95
Signal Processing - November 2017 - 96
Signal Processing - November 2017 - 97
Signal Processing - November 2017 - 98
Signal Processing - November 2017 - 99
Signal Processing - November 2017 - 100
Signal Processing - November 2017 - 101
Signal Processing - November 2017 - 102
Signal Processing - November 2017 - 103
Signal Processing - November 2017 - 104
Signal Processing - November 2017 - 105
Signal Processing - November 2017 - 106
Signal Processing - November 2017 - 107
Signal Processing - November 2017 - 108
Signal Processing - November 2017 - 109
Signal Processing - November 2017 - 110
Signal Processing - November 2017 - 111
Signal Processing - November 2017 - 112
Signal Processing - November 2017 - 113
Signal Processing - November 2017 - 114
Signal Processing - November 2017 - 115
Signal Processing - November 2017 - 116
Signal Processing - November 2017 - 117
Signal Processing - November 2017 - 118
Signal Processing - November 2017 - 119
Signal Processing - November 2017 - 120
Signal Processing - November 2017 - 121
Signal Processing - November 2017 - 122
Signal Processing - November 2017 - 123
Signal Processing - November 2017 - 124
Signal Processing - November 2017 - 125
Signal Processing - November 2017 - 126
Signal Processing - November 2017 - 127
Signal Processing - November 2017 - 128
Signal Processing - November 2017 - 129
Signal Processing - November 2017 - 130
Signal Processing - November 2017 - 131
Signal Processing - November 2017 - 132
Signal Processing - November 2017 - 133
Signal Processing - November 2017 - 134
Signal Processing - November 2017 - 135
Signal Processing - November 2017 - 136
Signal Processing - November 2017 - 137
Signal Processing - November 2017 - 138
Signal Processing - November 2017 - 139
Signal Processing - November 2017 - 140
Signal Processing - November 2017 - 141
Signal Processing - November 2017 - 142
Signal Processing - November 2017 - 143
Signal Processing - November 2017 - 144
Signal Processing - November 2017 - 145
Signal Processing - November 2017 - 146
Signal Processing - November 2017 - 147
Signal Processing - November 2017 - 148
Signal Processing - November 2017 - 149
Signal Processing - November 2017 - 150
Signal Processing - November 2017 - 151
Signal Processing - November 2017 - 152
Signal Processing - November 2017 - 153
Signal Processing - November 2017 - 154
Signal Processing - November 2017 - 155
Signal Processing - November 2017 - 156
Signal Processing - November 2017 - 157
Signal Processing - November 2017 - 158
Signal Processing - November 2017 - 159
Signal Processing - November 2017 - 160
Signal Processing - November 2017 - 161
Signal Processing - November 2017 - 162
Signal Processing - November 2017 - 163
Signal Processing - November 2017 - 164
Signal Processing - November 2017 - 165
Signal Processing - November 2017 - 166
Signal Processing - November 2017 - 167
Signal Processing - November 2017 - 168
Signal Processing - November 2017 - 169
Signal Processing - November 2017 - 170
Signal Processing - November 2017 - 171
Signal Processing - November 2017 - 172
Signal Processing - November 2017 - 173
Signal Processing - November 2017 - 174
Signal Processing - November 2017 - 175
Signal Processing - November 2017 - 176
Signal Processing - November 2017 - Cover3
Signal Processing - November 2017 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com