Multi Agent Deep Learning with Cooperative Communication

David Simões; Nuno Lau; Luís Paulo Reis

doi:10.2478/jaiscr-2020-0013

.blurhash-client-img { display: none !important; }

Multi Agent Deep Learning with Cooperative Communication

Journal of Artificial Intelligence and Soft Computing Research

Volume 10 (2020): Issue 3 (July 2020)

By: David Simões, Nuno Lau and Luís Paulo Reis

Open Access

|May 2020

Abstract

We consider the problem of multi agents cooperating in a partially-observable environment. Agents must learn to coordinate and share relevant information to solve the tasks successfully. This article describes Asynchronous Advantage Actor-Critic with Communication (A3C2), an end-to-end differentiable approach where agents learn policies and communication protocols simultaneously. A3C2 uses a centralized learning, distributed execution paradigm, supports independent agents, dynamic team sizes, partially-observable environments, and noisy communications. We compare and show that A3C2 outperforms other state-of-the-art proposals in multiple environments.

References

[1] Maximilian Hüttenrauch, Adrian Šošić, and Gerhard Neumann. Guided deep reinforcement learning for swarm systems. arXiv preprint arXiv:1709.06011, 2017.
Search in Google Scholar Back to article
[2] Lili Ma and Naira Hovakimyan. Vision-based cyclic pursuit for cooperative target tracking. Journal of Guidance, Control, and Dynamics, 36(2):617–622, 2013.10.2514/1.57598
Search in Google Scholar Back to article
[3] Patrick Mannion, Jim Duggan, and Enda Howley. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic road transport support systems, pages 47–66. Springer, 2016.10.1007/978-3-319-25808-9_4
Search in Google Scholar Back to article
[4] P Skobelev, E Simonova, and A Zhilyaev. Using multi-agent technology for the distributed management of a cluster of remote sensing satellites. Complex Syst: Fundament Appl, 90:287, 2016.10.2495/978-1-78466-155-7/024
Search in Google Scholar Back to article
[5] Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al. Star-craft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
Search in Google Scholar Back to article
[6] Jonathan Raiman, Susan Zhang, and Filip Wolski. Long-term planning and situational awareness in openai five. arXiv preprint arXiv:1912.06721, 2019.
Search in Google Scholar Back to article
[7] Tabish Rashid, Mikayel Samvelyan, Christian Schröder de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. CoRR, abs/1803.11485, 2018.
Search in Google Scholar Back to article
[8] L. Busoniu, R. Babuska, and B. De Schutter. A comprehensive survey of multiagent reinforcement learning. Trans. Sys. Man Cyber Part C, 38(2):156–172, March 2008.10.1109/TSMCC.2007.913919
Search in Google Scholar Back to article
[9] Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pages 2137–2145, 2016.
Search in Google Scholar Back to article
[10] Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon White-son. Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.10.1609/aaai.v32i1.11794
Search in Google Scholar Back to article
[11] Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI ’98/IAAI ’98, pages 746–752, Menlo Park, CA, USA, 1998. American Association for Artificial Intelligence.
Search in Google Scholar Back to article
[12] Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE, 12(4):1–15, 04 2017.10.1371/journal.pone.0172395538178528380078
Search in Google Scholar Back to article
[13] Maxim Egorov. Multi-Agent Deep Reinforcement Learning. Technical report, University of Stanford, Department of Computer Science, 2016.
Search in Google Scholar Back to article
[14] Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems, pages 66–83. Springer, 2017.10.1007/978-3-319-71682-4_5
Search in Google Scholar Back to article
[15] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pages 6379–6390, 2017.
Search in Google Scholar Back to article
[16] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pages 2085–2087, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems.
Search in Google Scholar Back to article
[17] Sainbayar Sukhbaatar, Rob Fergus, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pages 2244–2252, 2016.
Search in Google Scholar Back to article
[18] Peng Peng, Quan Yuan, Ying Wen, Yaodong Yang, Zhenkun Tang, Haitao Long, and Jun Wang. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. CoRR, abs/1703.10069, 2017.
Search in Google Scholar Back to article
[19] Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, volume 157, pages 157–163, 1994.10.1016/B978-1-55860-335-6.50027-1
Search in Google Scholar Back to article
[20] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533, 1986.10.1038/323533a0
Search in Google Scholar Back to article
[21] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318, 2013.
Search in Google Scholar Back to article
[22] Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip HS Torr, Pushmeet Kohli, and Shimon Whiteson. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1146–1155. JMLR. org, 2017.
Search in Google Scholar Back to article
[23] Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.10.1609/aaai.v32i1.11492
Search in Google Scholar Back to article
[24] Abhishek Das, Satwik Kottur, José MF Moura, Stefan Lee, and Dhruv Batra. Learning cooperative visual dialog agents with deep reinforcement learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 2951–2960, 2017.
Search in Google Scholar Back to article
[25] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229–256, May 1992.10.1007/BF00992696
Search in Google Scholar Back to article
[26] David B D’Ambrosio, Skyler Goodell, Joel Lehman, Sebastian Risi, and Kenneth O Stanley. Multirobot behavior synchronization through direct neural network communication. In International Conference on Intelligent Robotics and Applications, pages 603–614. Springer, 2012.10.1007/978-3-642-33515-0_59
Search in Google Scholar Back to article
[27] Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni. Multi-agent cooperation and the emergence of (natural) language. Proceedings of the International Conference on Learning Representations, 2017.
Search in Google Scholar Back to article
[28] Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. Deal or no deal? end-toend learning of negotiation dialogues. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2443–2453. Association for Computational Linguistics, 2017.10.18653/v1/D17-1259
Search in Google Scholar Back to article
[29] Qiyang Li, Xintong Du, Yizhou Huang, Quinlan Sykora, and Angela P. Schoellig. Learning of coordination policies for robotic swarms. CoRR, abs/1709.06620, 2017.
Search in Google Scholar Back to article
[30] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pages 1928–1937, 2016.
Search in Google Scholar Back to article
[31] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR.
Search in Google Scholar Back to article
[32] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. 3rd International Conference for Learning Representations, San Diego, 2015.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/jaiscr-2020-0013

Journal RSS Feed

Language: English

Page range: 189 - 207

Submitted on: Nov 1, 2019

Accepted on: Mar 26, 2020

Published on: May 23, 2020

Published by: SAN University

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

multi-agent systems,

deep reinforcement learning,

centralized learning

Related subjects:

Computer sciences,

Databases and data mining,

Artificial intelligence

© 2020 David Simões, Nuno Lau, Luís Paulo Reis, published by SAN University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 10 (2020): Issue 3 (July 2020)