Signal Processing - November 2017 - 38

[48] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D.
Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in Proc. Int. Conf. Learning Representations, 2016.

[74] S. Singh, D. Litman, M. Kearns, and M. Walker, "Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system," J. Artificial
Intell. Res., vol. 16, pp. 105-133, Feb. 2002.

[49] S. Mohamed and D. J. Rezende, "Variational information maximisation for
intrinsically motivated reinforcement learning," in Proc. Neural Information
Processing Systems, 2015, pp. 2125-2133.

[75] B. C. Stadie, S. Levine, and P. Abbeel, "Incentivizing exploration in reinforcement
learning with deep predictive models," in NIPS Workshop on Deep Reinforcement
Learning, 2015.

[50] O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans. (2017). Bridging the gap
between value and policy based reinforcement learning. arXiv. [Online]. Available:
https://arxiv.org/abs/1702.08892

[76] A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman, "PAC model-free
reinforcement learning," in Proc. Int. Conf. Machine Learning, 2006, pp. 881-888.

[51] A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. de Maria, V.
Panneershelvam, M. Suleyman, et al., "Massively parallel methods for deep reinforcement learning," in ICML Workshop on Deep Learning, 2015.
[52] A. Y. Ng and S. J. Russell, "Algorithms for inverse reinforcement learning," in
Proc. Int. Conf. Machine Learning, 2000, pp. 663-670.
[53] A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, and
E. Liang, "Autonomous inverted helicopter flight via reinforcement learning," in
Proc. Int. Symp. Experimental Robotics, 2006, pp. 363-372.
[54] B. O'Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih, "PGQ: Combining
policy gradient and Q-learning," in Proc. Int. Conf. Learning Representations,
2017.

[77] S. Sukhbaatar, A. Szlam, and R. Fergus, "Learning multiagent communication
with backpropagation," in Proc. Neural Information Processing Systems, 2016, pp.
2244-2252.
[78] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.
Cambridge, MA: MIT Press, 1998.
[79] R. S. Sutton, D. Precup, and S. Singh, "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning," Artificial Intell., vol. 112, no.
1-2, pp. 181-211, 1999.
[80] A. Tamar, Y. Wu, G. Thomas, S. Levine, and P. Abbeel, "Value iteration networks," in Proc. Neural Information Processing Systems, 2016, pp. 2154-2162.
[81] G. Tesauro, "Temporal difference learning and TD-gammon," Commun. ACM,
vol. 38, no. 3, pp. 58-68, 1995.

[55] J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. Singh, "Action-conditional video
prediction using deep networks in Atari games," in Proc. Neural Information
Processing Systems, 2015, pp. 2863-2871.

[82] C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor, "A deep hierarchical approach to lifelong learning in Minecraft," in Proc. Association for the
Advancement Artificial Intelligence, 2017, pp. 1553-1561.

[56] I. Osband, C. Blundell, A. Pritzel, and B. van Roy, "Deep exploration via bootstrapped DQN," in Proc. Neural Information Processing Systems, 2016, pp. 4026-
4034.

[83] J. N. Tsitsiklis and B. van Roy, "Analysis of temporal-difference learning with
function approximation," in Proc. Neural Information Processing Systems, 1997, pp.
1075-1081.

[57] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, "Curiosity-driven exploration by self-supervised prediction," in Proc. Int. Conf. Machine Learning, 2017, pp.
2778-2787.

[84] E. Tzeng, C. Devin, J. Hoffman, C. Finn, X. Peng, S. Levine, K. Saenko, and T.
Darrell, "Towards adapting deep visuomotor representations from simulated to real
environments," in Workshop Algorithmic Foundations Robotics, 2016.

[58] P. Peng, Q. Yuan, Y. Wen, Y. Yang, Z. Tang, H. Long, and J. Wang. (2017).
Multiagent bidirectionally-coordinated nets for learning to play StarCraft combat
games. arXiv. [Online]. Available: https://arxiv.org/abs/1703.10069

[85] N. Usunier, G. Synnaeve, Z. Lin, and S. Chintala, "Episodic exploration for deep
deterministic policies: An application to StarCraft micromanagement tasks," in Proc.
Int. Conf. Learning Representations, 2017.

[59] D. A. Pomerleau, "ALVINN, an autonomous land vehicle in a neural network,"
in Proc. Neural Information Processing Systems, 1989, pp. 305-313.

[86] H. van Hasselt, "Double Q-learning," in Proc. Neural Information Processing
Systems, 2010, pp. 2613-2621.

[60] D. J. Rezende, S. Mohamed, and D. Wierstra, "Stochastic backpropagation and
approximate inference in deep generative models," in Proc. Int. Conf. Machine
Learning, 2014, pp. 1278-1286.

[87] H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double
Q-learning," in Proc. Association for the Advancement of Artificial Intelligence, 2016,
pp. 2094-2100.

[61] M. Riedmiller, "Neural fitted q iteration-First experiences with a data efficient
neural reinforcement learning method," in Proc. European Conf. Machine
Learning, 2005, pp. 317-328.

[88] A. Vezhnevets, V. Mnih, S. Osindero, A. Graves, O. Vinyals, J. Agapiou, and K.
Kavukcuoglu. "Strategic attentive writer for learning macro-actions," in Proc. Neural
Information Processing Systems, 2016, pp. 3486-3494.

[62] G. A. Rummery and M. Niranjan, "On-line Q-learning using connectionist systems," Dept. Engineering, Univ. Cambridge, MA, Tech. Rep. CUED/F-INFENG/
TR 166, 1994.

[89] A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and
K. Kavukcuoglu, "FeUdal networks for hierarchical reinforcement learning," in Proc.
Int. Conf. Machine Learning, 2017, pp. 3540-3549.

[63] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K.
Kavukcuoglu, R. Pascanu, and R. Hadsell. (2016). Progressive neural networks.
arXiv. [Online]. Available: https://arxiv.org/abs/1606.04671

[90] N. Wahlström, T. B. Schön, and M. P. Deisenroth, "Learning deep dynamical
models from image pixels," in Proc. IFAC Symp. System Identification, 2015, pp.
1059-1064.

[64] A. A. Rusu, M. Vecerik, T. Rothörl, N. Heess, R. Pascanu, and R. Hadsell.
(2016). Sim-to-real robot learning from pixels with progressive nets. arXiv. [Online].
Available: https://arxiv.org/abs/1610.04286

[91] N. Wahlström, T. B. Schön, and M. P. Deisenroth, "From pixels to torques: policy learning with deep dynamical models," in ICML Workshop on Deep Learning,
2015.

[65] T. Salimans, J. Ho, X. Chen, and I. Sutskever. (2017). Evolution strategies as a
scalable alternative to reinforcement learning. arXiv. [Online]. Available: https://
arxiv.org/abs/1703.03864

[92] Z. Wang, N. de Freitas, and M. Lanctot, "Dueling network architectures for deep
reinforcement learning," in Proc. Int. Conf. Learning Representations, 2016.

[66] T. Schaul, D. Horgan, K. Gregor, and D. Silver, "Universal value function
approximators," in Proc. Int. Conf. Machine Learning, 2015, pp. 1312-1320.

[93] Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos, K. Kavukcuoglu, and N. de
Freitas, "Sample efficient actor-critic with experience replay," in Proc. Int. Conf.
Learning Representations, 2017.

[67] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience
replay," in Proc. Int. Conf. Learning Representations, 2016.

[94] C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learning, vol. 8, no. 3-4,
pp. 279-292, 1992.

[68] J. Schmidhuber, "A possibility for implementing curiosity and boredom in
model-building neural controllers," in Proc. Int. Conf. Simulation Adaptive
Behavior, 1991, pp. 222-227.

[95] M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller, "Embed to control:
A locally linear latent dynamics model for control from raw images," in Proc. Neural
Information Processing Systems, 2015, pp. 2746-2754.

[69] J. Schmidhuber and R. Huber, "Learning to generate artificial fovea trajectories
for target detection," Int. J. Neural Syst., vol. 2, no. 01n02, pp. 125-134, 1991.
[Online]. Ava ilable: http://www.worldscientif ic.com /doi /abs/10.1142/
S012906579100011X

[96] D. Wierstra, A. Förster, J. Peters, and J. Schmidhuber, "Recurrent policy gradients," Logic J. IGPL, vol. 18, no. 5, pp. 620-634, 2010.

[70] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust region
policy optimization," in Proc. Int. Conf. Machine Learning, 2015, pp. 1889-1897.
[71] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, "Highdimensional continuous control using generalized advantage estimation," in Proc.
Int. Conf. Learning Representations, 2016.
[72] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
"Deterministic policy gradient algorithms," in Proc. Int. Conf. Machine Learning,
2014, pp. 387-395.
[73] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J.
Schrittwieser, I. Antonoglou, et al., "Mastering the game of go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484-489, 2016.

38

[97] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist
reinforcement learning," Mach. Learning, vol. 8, no. 3-4, pp. 229-256, 1992.
[98] M. Wulfmeier, P. Ondruska, and I. Posner, "Maximum entropy deep inverse
reinforcement learning," in NIPS Workshop on Deep Reinforcement Learning,
2015.
[99] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel,
and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual
attention," in Proc. Int. Conf. Machine Learning, 2015, pp. 2048-2057.
[100] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi,
"Target-driven visual navigation in indoor scenes using deep reinforcement learning," in
Proc. IEEE Int. Conf. Robotics and Automation, 2017, pp. 3357-3364.

SP



IEEE SIGNAL PROCESSING MAGAZINE

|

November 2017

|


https://www.arxiv.org/abs/1702.08892 https://www.arxiv.org/abs/1703.10069 https://www.arxiv.org/abs/1606.04671 https://www.arxiv.org/abs/1610.04286 http://https:// http://www.arxiv.org/abs/1703.03864

Table of Contents for the Digital Edition of Signal Processing - November 2017

Signal Processing - November 2017 - Cover1
Signal Processing - November 2017 - Cover2
Signal Processing - November 2017 - 1
Signal Processing - November 2017 - 2
Signal Processing - November 2017 - 3
Signal Processing - November 2017 - 4
Signal Processing - November 2017 - 5
Signal Processing - November 2017 - 6
Signal Processing - November 2017 - 7
Signal Processing - November 2017 - 8
Signal Processing - November 2017 - 9
Signal Processing - November 2017 - 10
Signal Processing - November 2017 - 11
Signal Processing - November 2017 - 12
Signal Processing - November 2017 - 13
Signal Processing - November 2017 - 14
Signal Processing - November 2017 - 15
Signal Processing - November 2017 - 16
Signal Processing - November 2017 - 17
Signal Processing - November 2017 - 18
Signal Processing - November 2017 - 19
Signal Processing - November 2017 - 20
Signal Processing - November 2017 - 21
Signal Processing - November 2017 - 22
Signal Processing - November 2017 - 23
Signal Processing - November 2017 - 24
Signal Processing - November 2017 - 25
Signal Processing - November 2017 - 26
Signal Processing - November 2017 - 27
Signal Processing - November 2017 - 28
Signal Processing - November 2017 - 29
Signal Processing - November 2017 - 30
Signal Processing - November 2017 - 31
Signal Processing - November 2017 - 32
Signal Processing - November 2017 - 33
Signal Processing - November 2017 - 34
Signal Processing - November 2017 - 35
Signal Processing - November 2017 - 36
Signal Processing - November 2017 - 37
Signal Processing - November 2017 - 38
Signal Processing - November 2017 - 39
Signal Processing - November 2017 - 40
Signal Processing - November 2017 - 41
Signal Processing - November 2017 - 42
Signal Processing - November 2017 - 43
Signal Processing - November 2017 - 44
Signal Processing - November 2017 - 45
Signal Processing - November 2017 - 46
Signal Processing - November 2017 - 47
Signal Processing - November 2017 - 48
Signal Processing - November 2017 - 49
Signal Processing - November 2017 - 50
Signal Processing - November 2017 - 51
Signal Processing - November 2017 - 52
Signal Processing - November 2017 - 53
Signal Processing - November 2017 - 54
Signal Processing - November 2017 - 55
Signal Processing - November 2017 - 56
Signal Processing - November 2017 - 57
Signal Processing - November 2017 - 58
Signal Processing - November 2017 - 59
Signal Processing - November 2017 - 60
Signal Processing - November 2017 - 61
Signal Processing - November 2017 - 62
Signal Processing - November 2017 - 63
Signal Processing - November 2017 - 64
Signal Processing - November 2017 - 65
Signal Processing - November 2017 - 66
Signal Processing - November 2017 - 67
Signal Processing - November 2017 - 68
Signal Processing - November 2017 - 69
Signal Processing - November 2017 - 70
Signal Processing - November 2017 - 71
Signal Processing - November 2017 - 72
Signal Processing - November 2017 - 73
Signal Processing - November 2017 - 74
Signal Processing - November 2017 - 75
Signal Processing - November 2017 - 76
Signal Processing - November 2017 - 77
Signal Processing - November 2017 - 78
Signal Processing - November 2017 - 79
Signal Processing - November 2017 - 80
Signal Processing - November 2017 - 81
Signal Processing - November 2017 - 82
Signal Processing - November 2017 - 83
Signal Processing - November 2017 - 84
Signal Processing - November 2017 - 85
Signal Processing - November 2017 - 86
Signal Processing - November 2017 - 87
Signal Processing - November 2017 - 88
Signal Processing - November 2017 - 89
Signal Processing - November 2017 - 90
Signal Processing - November 2017 - 91
Signal Processing - November 2017 - 92
Signal Processing - November 2017 - 93
Signal Processing - November 2017 - 94
Signal Processing - November 2017 - 95
Signal Processing - November 2017 - 96
Signal Processing - November 2017 - 97
Signal Processing - November 2017 - 98
Signal Processing - November 2017 - 99
Signal Processing - November 2017 - 100
Signal Processing - November 2017 - 101
Signal Processing - November 2017 - 102
Signal Processing - November 2017 - 103
Signal Processing - November 2017 - 104
Signal Processing - November 2017 - 105
Signal Processing - November 2017 - 106
Signal Processing - November 2017 - 107
Signal Processing - November 2017 - 108
Signal Processing - November 2017 - 109
Signal Processing - November 2017 - 110
Signal Processing - November 2017 - 111
Signal Processing - November 2017 - 112
Signal Processing - November 2017 - 113
Signal Processing - November 2017 - 114
Signal Processing - November 2017 - 115
Signal Processing - November 2017 - 116
Signal Processing - November 2017 - 117
Signal Processing - November 2017 - 118
Signal Processing - November 2017 - 119
Signal Processing - November 2017 - 120
Signal Processing - November 2017 - 121
Signal Processing - November 2017 - 122
Signal Processing - November 2017 - 123
Signal Processing - November 2017 - 124
Signal Processing - November 2017 - 125
Signal Processing - November 2017 - 126
Signal Processing - November 2017 - 127
Signal Processing - November 2017 - 128
Signal Processing - November 2017 - 129
Signal Processing - November 2017 - 130
Signal Processing - November 2017 - 131
Signal Processing - November 2017 - 132
Signal Processing - November 2017 - 133
Signal Processing - November 2017 - 134
Signal Processing - November 2017 - 135
Signal Processing - November 2017 - 136
Signal Processing - November 2017 - 137
Signal Processing - November 2017 - 138
Signal Processing - November 2017 - 139
Signal Processing - November 2017 - 140
Signal Processing - November 2017 - 141
Signal Processing - November 2017 - 142
Signal Processing - November 2017 - 143
Signal Processing - November 2017 - 144
Signal Processing - November 2017 - 145
Signal Processing - November 2017 - 146
Signal Processing - November 2017 - 147
Signal Processing - November 2017 - 148
Signal Processing - November 2017 - 149
Signal Processing - November 2017 - 150
Signal Processing - November 2017 - 151
Signal Processing - November 2017 - 152
Signal Processing - November 2017 - 153
Signal Processing - November 2017 - 154
Signal Processing - November 2017 - 155
Signal Processing - November 2017 - 156
Signal Processing - November 2017 - 157
Signal Processing - November 2017 - 158
Signal Processing - November 2017 - 159
Signal Processing - November 2017 - 160
Signal Processing - November 2017 - 161
Signal Processing - November 2017 - 162
Signal Processing - November 2017 - 163
Signal Processing - November 2017 - 164
Signal Processing - November 2017 - 165
Signal Processing - November 2017 - 166
Signal Processing - November 2017 - 167
Signal Processing - November 2017 - 168
Signal Processing - November 2017 - 169
Signal Processing - November 2017 - 170
Signal Processing - November 2017 - 171
Signal Processing - November 2017 - 172
Signal Processing - November 2017 - 173
Signal Processing - November 2017 - 174
Signal Processing - November 2017 - 175
Signal Processing - November 2017 - 176
Signal Processing - November 2017 - Cover3
Signal Processing - November 2017 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201809
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201807
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201805
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201803
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_201801
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0917
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0717
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0517
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0317
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0117
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0916
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0716
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0516
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0316
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0116
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0915
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0715
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0515
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0315
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0115
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0914
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0714
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0514
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0314
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0114
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0913
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0713
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0513
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0313
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0113
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0912
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0712
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0512
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0312
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0112
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0911
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0711
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0511
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0311
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0111
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0910
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0710
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0510
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0310
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0110
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0909
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0709
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0509
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0309
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0109
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_1108
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0908
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0708
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0508
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0308
https://www.nxtbook.com/nxtbooks/ieee/signalprocessing_0108
https://www.nxtbookmedia.com