IEEE Computational Intelligence Magazine - August 2019 - 18

We then described how to eliminate the need for simulators or forward models by using reinforcement learning to
learn autonomous, self-improving behaviors for cooperativecompetitive environments with sparse rewards such as Total
War battles. We showed how a basic architecture design can be
used with techniques such as curriculum learning and reward
shaping to learn diverse and complex behaviors for scenarios
with up to 6v6 units of mixed types. The highest difficulty
built-in AI was defeated in 77% of the games. To increase data
efficiency and to scale better to scenarios with more units, we
attempted a hierarchical RL approach, but obtained mixed
results, likely because of poor goal abstraction choices. A few
options to be explored in future work have been mentioned
in Subsection IV-D.
In addition, other data efficient methods should be investigated. One idea is to use 'world models', which learn compressed spatial and temporal representations of the environment
that can be trained relatively quickly in an unsupervised manner [41]. Then agents can be trained, even entirely based on
"hallucinated" action trajectories generated by the world
model. It has been shown that such policies can transfer surprisingly well to actual environments. We found that designing
good individual reward functions is very important to the overall learning stability, sometimes requiring extensive tuning to
obtain convincing results. Using team reward signals and then
learning how to assign the credit is a more elegant solution
which proved successful in other cooperative multi-agent environments [42], [43].
VI. Acknowledgments

This work is based on material presented in two conference
papers [27] and [28] and internship work at Creative Assembly.
References

[1] R. Koster and W. Wright, A Theory of Fun for Game Design. Paraglyph Press, 2004.
[2] D. Silver et al., "Mastering the game of Go without human knowledge," Nature, vol.
550, no. 7676, p. 354, Oct. 2017.
[3] S. Ontañón, "The combinatorial multi-armed bandit problem and its application to
RTS games," in Proc. AAAI Conf. Artificial Intelligence and Interactive Digital Entertainment,
Oct. 2013, pp. 58-64.
[4] D. Churchill, A. Saffidine, and M. Buro, "Fast heuristic search for RTS game combat
scenarios," in Proc. AAAI Conf. Artificial Intelligence and Interactive Digital Entertainment
(AIIDE), Oct. 2012, pp. 112-117.
[5] A. Uriarte and S. Ontanón, "Single believe state generation for partially observable
real-time strategy games," in Proc. IEEE Conf. Computational Intelligence and Games (CIG),
Aug. 2017, pp. 296-303.
[6] D. Churchill and M. Buro, "Incorporating search algorithms into RTS game agents,"
in Proc. AIIDE Workshop on Artificial Intelligence in Adversarial Real-Time Games, Oct. 2012.
[7] D. Churchill and M. Buro, "Portfolio greedy search and simulation for large-scale
combat in StarCraft," in Proc. IEEE Conf. Computational Intelligence in Games (CIG), Oct.
2013, pp. 1-8.
[8] N. A. Barriga, M. Stanescu, and M. Buro, "Game tree search based on nondeterministic action scripts in real-time strategy games," IEEE Trans. Games, vol. 10, no. 1, pp.
69-77, Mar. 2018.
[9] S. Ontañón and M. Buro, "Adversarial hierarchical-task network planning for complex real-time games," in Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), July 2015, pp.
1652-1658.
[10] A. Uriarte and S. Ontañón, "Game-tree search over high-level game states in RTS
games," in Proc. AAAI Conf. Artificial Intelligence and Interactive Digital Entertainment (AIIDE), Oct. 2014, pp. 73-79.
[11] D. Churchill and M. Buro, "Incorporating search algorithms into RTS game agents,"
in Proc. AIIDE Workshop on Artificial Intelligence in Adversarial Real-Time Games, 2012.
[12] S. Ontañón, "Combinatorial multi-armed bandits for real-time strategy games," J.
Artif. Intell. Res., vol. 58, pp. 665-702, Mar. 2017.

18

IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | AUGUST 2019

[13] L. H. Lelis, "Stratified strategy selection for unit control in real-time strategy games,"
in Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), Aug. 2017, pp. 3735-3741.
[14] R. O. Moraes, J. R. Mariño, L. H. Lelis, and M. A. Nascimento, "Action abstractions
for combinatorial multi-armed bandit tree search," in Proc. AAAI Conf. Artificial Intelligence and Interactive Digital Entertainment (AIIDE), Nov. 2018, pp. 74-80.
[15] R. O. Moraes and L. H. Lelis, "Asymmetric action abstractions for multi-unit control
in adversarial real-time games," in Proc. AAAI Conf. Artificial Intelligence (AAAI), Feb.
2017, pp. 876-883.
[16] A. Kovarsky and M. Buro, "Heuristic search applied to abstract combat games," in
Proc. Advances in Artificial Intelligence, May 2005, pp. 66-78.
[17] M. Stanescu, S. P. Hernandez, G. Erickson, R. Greiner, and M. Buro, "Predicting
army combat outcomes in StarCraft," in Proc. AAAI Artificial Intelligence and Interactive
Digital Entertainment Conf. (AIIDE), Oct. 2013, pp. 86-92.
[18] M. Stanescu, N. A. Barriga, and M. Buro, "Using Lanchester attrition laws for combat prediction in StarCraft," in Proc. AAAI Conf. Artificial Intelligence and Interactive Digital
Entertainment (AIIDE), Nov. 2015, pp. 86-92.
[19] G. Erickson and M. Buro, "Global state evaluation in StarCraft," in Proc. AAAI Conf.
Artificial Intelligence and Interactive Digital Entertainment (AIIDE), AAAI Press, Oct. 2014,
pp. 112-118.
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep
convolutional neural networks," in Proc. Advances in Neural Information Processing Systems
(NIPS), Dec. 2012, pp. 1097-1105.
[21] D. Silver et al., "Mastering the game of Go with deep neural networks and tree
search," Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016.
[22] V. Mnih et al., "Human-level control through deep reinforcement learning," Nature,
vol. 518, pp. 529-533, Feb. 2015.
[23] M. Campbell, A. J. Hoane, and F.-H. Hsu, "Deep Blue," Artif. Intell., vol. 134, no.
1, pp. 57-83, Jan. 2002.
[24] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, Mar. 1998.
[25] V. Mnih et al., "Asynchronous methods for deep reinforcement learning," in Proc. Int.
Conf. Machine Learning (ICML), June 2016, pp. 1928-1937.
[26] Y. Li, "Deep reinforcement learning," CoRR, vol. abs/1810.06339, 2018. [Online].
Available: http://arxiv.org/abs/1810.06339
[27] N. A. Barriga, M. Stanescu, and M. Buro, "Combining strategic learning and tactical
search in real-time strategy games," in Proc. AAAI Conf. Artificial Intelligence and Interactive
Digital Entertainment (AIIDE), Oct. 2017, pp. 9-15.
[28] M. Stanescu, N. A. Barriga, A. Hess, and M. Buro, "Evaluating real-time strategy
game states using convolutional neural networks," in Proc. IEEE Conf. Computational Intelligence and Games (CIG), Sept. 2016.
[29] H. v. Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double
Q-learning," in Proc. AAAI Conf. Artificial Intelligence (AAAI), Feb. 2016, pp. 2094-2100.
[30] V. Mnih et al., "Playing Atari with deep reinforcement learning," in Proc. NIPS Deep
Learning Workshop, Dec. 2013.
[31] O. Vinyals et al., "StarCraft II: A new challenge for reinforcement learning," DeepMind, Blizzard, Tech. Rep., 2017.
[32] Z. Wang, N. de Freitas, and M. Lanctot, "Dueling network architectures for deep
reinforcement learning," CoRR, vol. abs/1511.06581, 2015. [Online]. Available: http://
arxiv.org/abs/1511.06581
[33] M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents,"
in Proc. Int. Conf. Machine Learning (ICML), June 1993, pp. 330-337.
[34] J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, "Counterfactual multi-agent policy gradients," CoRR, vol. abs/1705.08926, 2017. [Online]. Available:
http://arxiv.org/abs/1705.08926
[35] J. Z. Leibo, V. F. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel, "Multi-agent
reinforcement learning in sequential social dilemmas," CoRR, vol. abs/1702.03037, 2017.
[Online]. Available: http://arxiv.org/abs/1702.03037
[36] P. Peng et al., "Multiagent bidirectionally-coordinated nets for learning to play StarCraft combat games," CoRR, vol. abs/1703.10069, 2017. [Online]. Available: http://arxiv
.org/abs/1703.10069
[37] S. Devlin, L. Yliniemi, D. Kudenko, and K. Tumer, "Potential-based difference
rewards for multiagent reinforcement learning," in Proc. Int. Conf. Autonomous Agents and
Multi-Agent Systems, May 2014, pp. 165-172.
[38] A. Eck, L.-K. Soh, S. Devlin, and D. Kudenko, "Potential-based reward shaping for
finite horizon online POMDP planning," Auton. Agents Multi-Agent Syst., vol. 30, no. 3,
pp. 403-445, Mar. 2016.
[39] J. Foerster, N. Nardelli, G. Farquhar, P. H. S. Torr, P. Kohli, and S. Whiteson, "Stabilising experience replay for deep multi-agent reinforcement learning," in Proc. Int. Conf.
Machine Learning (ICML), Aug. 2017.
[40] P. Sun et al., "TStarBots: Defeating the cheating level builtin AI in StarCraft II in the
full game," arXiv Preprint, arXiv:1809.07193, 2018.
[41] D. Ha and J. Schmidhuber, "World models," CoRR, vol. abs/1803.10122, 2018. [Online]. Available: http://arxiv.org/abs/1803.10122
[42] P. Sunehag et al., "Value-decomposition networks for cooperative multi-agent
learning," CoRR, vol. abs/1706.05296, 2017. [Online]. Available: http://arxiv.org/
abs/1706.05296
[43] T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, "QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning," CoRR, vol. abs/1803.11485, 2018. [Online]. Available: http://arxiv.org/
abs/1803.11485


http://www.arxiv.org/abs/1810.06339 http://www.arxiv.org/abs/1511.06581 http://www.arxiv.org/abs/1511.06581 http://www.arxiv.org/abs/1705.08926 http://www.arxiv.org/abs/1702.03037 http://arxiv.org/abs/1703.10069 http://arxiv.org/abs/1703.10069 http://www.arxiv.org/abs/1803.10122 http://www.arxiv.org/abs/1706.05296 http://www.arxiv.org/abs/1706.05296 http://www.arxiv.org/abs/1803.11485 http://www.arxiv.org/abs/1803.11485

IEEE Computational Intelligence Magazine - August 2019

Table of Contents for the Digital Edition of IEEE Computational Intelligence Magazine - August 2019

Contents
IEEE Computational Intelligence Magazine - August 2019 - Cover1
IEEE Computational Intelligence Magazine - August 2019 - Cover2
IEEE Computational Intelligence Magazine - August 2019 - Contents
IEEE Computational Intelligence Magazine - August 2019 - 2
IEEE Computational Intelligence Magazine - August 2019 - 3
IEEE Computational Intelligence Magazine - August 2019 - 4
IEEE Computational Intelligence Magazine - August 2019 - 5
IEEE Computational Intelligence Magazine - August 2019 - 6
IEEE Computational Intelligence Magazine - August 2019 - 7
IEEE Computational Intelligence Magazine - August 2019 - 8
IEEE Computational Intelligence Magazine - August 2019 - 9
IEEE Computational Intelligence Magazine - August 2019 - 10
IEEE Computational Intelligence Magazine - August 2019 - 11
IEEE Computational Intelligence Magazine - August 2019 - 12
IEEE Computational Intelligence Magazine - August 2019 - 13
IEEE Computational Intelligence Magazine - August 2019 - 14
IEEE Computational Intelligence Magazine - August 2019 - 15
IEEE Computational Intelligence Magazine - August 2019 - 16
IEEE Computational Intelligence Magazine - August 2019 - 17
IEEE Computational Intelligence Magazine - August 2019 - 18
IEEE Computational Intelligence Magazine - August 2019 - 19
IEEE Computational Intelligence Magazine - August 2019 - 20
IEEE Computational Intelligence Magazine - August 2019 - 21
IEEE Computational Intelligence Magazine - August 2019 - 22
IEEE Computational Intelligence Magazine - August 2019 - 23
IEEE Computational Intelligence Magazine - August 2019 - 24
IEEE Computational Intelligence Magazine - August 2019 - 25
IEEE Computational Intelligence Magazine - August 2019 - 26
IEEE Computational Intelligence Magazine - August 2019 - 27
IEEE Computational Intelligence Magazine - August 2019 - 28
IEEE Computational Intelligence Magazine - August 2019 - 29
IEEE Computational Intelligence Magazine - August 2019 - 30
IEEE Computational Intelligence Magazine - August 2019 - 31
IEEE Computational Intelligence Magazine - August 2019 - 32
IEEE Computational Intelligence Magazine - August 2019 - 33
IEEE Computational Intelligence Magazine - August 2019 - 34
IEEE Computational Intelligence Magazine - August 2019 - 35
IEEE Computational Intelligence Magazine - August 2019 - 36
IEEE Computational Intelligence Magazine - August 2019 - 37
IEEE Computational Intelligence Magazine - August 2019 - 38
IEEE Computational Intelligence Magazine - August 2019 - 39
IEEE Computational Intelligence Magazine - August 2019 - 40
IEEE Computational Intelligence Magazine - August 2019 - 41
IEEE Computational Intelligence Magazine - August 2019 - 42
IEEE Computational Intelligence Magazine - August 2019 - 43
IEEE Computational Intelligence Magazine - August 2019 - 44
IEEE Computational Intelligence Magazine - August 2019 - 45
IEEE Computational Intelligence Magazine - August 2019 - 46
IEEE Computational Intelligence Magazine - August 2019 - 47
IEEE Computational Intelligence Magazine - August 2019 - 48
IEEE Computational Intelligence Magazine - August 2019 - 49
IEEE Computational Intelligence Magazine - August 2019 - 50
IEEE Computational Intelligence Magazine - August 2019 - 51
IEEE Computational Intelligence Magazine - August 2019 - 52
IEEE Computational Intelligence Magazine - August 2019 - 53
IEEE Computational Intelligence Magazine - August 2019 - 54
IEEE Computational Intelligence Magazine - August 2019 - 55
IEEE Computational Intelligence Magazine - August 2019 - 56
IEEE Computational Intelligence Magazine - August 2019 - 57
IEEE Computational Intelligence Magazine - August 2019 - 58
IEEE Computational Intelligence Magazine - August 2019 - 59
IEEE Computational Intelligence Magazine - August 2019 - 60
IEEE Computational Intelligence Magazine - August 2019 - 61
IEEE Computational Intelligence Magazine - August 2019 - 62
IEEE Computational Intelligence Magazine - August 2019 - 63
IEEE Computational Intelligence Magazine - August 2019 - 64
IEEE Computational Intelligence Magazine - August 2019 - 65
IEEE Computational Intelligence Magazine - August 2019 - 66
IEEE Computational Intelligence Magazine - August 2019 - 67
IEEE Computational Intelligence Magazine - August 2019 - 68
IEEE Computational Intelligence Magazine - August 2019 - 69
IEEE Computational Intelligence Magazine - August 2019 - 70
IEEE Computational Intelligence Magazine - August 2019 - 71
IEEE Computational Intelligence Magazine - August 2019 - 72
IEEE Computational Intelligence Magazine - August 2019 - 73
IEEE Computational Intelligence Magazine - August 2019 - 74
IEEE Computational Intelligence Magazine - August 2019 - 75
IEEE Computational Intelligence Magazine - August 2019 - 76
IEEE Computational Intelligence Magazine - August 2019 - Cover3
IEEE Computational Intelligence Magazine - August 2019 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202311
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202308
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202305
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202302
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202211
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202208
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202205
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202202
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202111
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202108
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202105
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202102
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202011
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202008
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202005
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_202002
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201911
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201908
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201905
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201902
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201811
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201808
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201805
https://www.nxtbook.com/nxtbooks/ieee/computationalintelligence_201802
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring17
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring16
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring15
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring14
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_summer13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_spring13
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_winter12
https://www.nxtbook.com/nxtbooks/ieee/computational_intelligence_fall12
https://www.nxtbookmedia.com