Computers & Operations Research, 2012, Р[22] Zhang K, He F, Zhang Z, et al. Multi-vehicle routing problems with soft 39 (9): 2033-2050. Р time windows: A multi-agent reinforcement learning approach [J]. [37] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization [C]// In Р Transportation Research Part C: Emerging Technologies, 2020, 121: International Conference on Learning Representations, 2015. Р 102861. [38] Stodola P. Using Metaheuristics on the Multi-Depot Vehicle Routing Р[23] 王万良, 陈浩立, 李国庆, 等. 基于深度强化学习的多配送中心车辆 Problem with Modified Optimization Criterion [J]. 2018, 11 (5): 74. Р 路径规划 [J/OL]. 控制与决策: 1-9 [2022-04-21]. DOI: 10. 13195/j.