IEEE [21]Ditmarsch H V,Kooi B.One hundred prisoners and a light Р16th International Conference on Autom- ation Science and bulb[M].USA:Springer International Publishing,2015:83-94. РEngineering (CASE),2020:1257-1262 [22]Foerster Jakob N,Yannis M Assael,Nando de Freitas,et al. Р[19] Ding Z,Huang T,Lu Z.Learning individually inferred Learning to communicate to solve riddles with deep distributed Рcommunication for multi-agent cooperation[J]. recurrent q-networks[J].arXiv:1602.02672,2016. РarXiv:2006.06455,2020. 附中文参考文献: Р[20] Das A,Gervet T,Romoff J,et al.Tarmac: targeted multi- [1]孙 彧,曹 雷,陈希亮,等.多智能体深度强化学习研究综述Рagent communication[C]//International Conference on Machine [J].计算机工程与应用,2020,56(5):13-24.