上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (5): 575-585.doi: 10.16183/j.cnki.jsjtu.2019.277

所属专题: 《上海交通大学学报》2021年12期专题汇总专辑 《上海交通大学学报》2021年“自动化技术、计算机技术”专题

• • 上一篇    下一篇

基于深度强化学习的区域化视觉导航方法

李鹏, 阮晓钢, 朱晓庆(), 柴洁, 任顶奇, 刘鹏飞   

  1. 北京工业大学 信息学部,北京 100124
  • 收稿日期:2019-09-26 出版日期:2021-05-28 发布日期:2021-06-01
  • 通讯作者: 朱晓庆 E-mail:alex.zhuxq@bjut.edu.cn
  • 作者简介:李 鹏(1992-),男,河北省廊坊市人,博士生,主要研究方向为机器人导航.
  • 基金资助:
    国家自然科学基金(61773027);北京市自然科学基金(4202005)

A Regionalization Vision Navigation Method Based on Deep Reinforcement Learning

LI Peng, RUAN Xiaogang, ZHU Xiaoqing(), CHAI Jie, REN Dingqi, LIU Pengfei   

  1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
  • Received:2019-09-26 Online:2021-05-28 Published:2021-06-01
  • Contact: ZHU Xiaoqing E-mail:alex.zhuxq@bjut.edu.cn

摘要:

针对移动机器人在分布式环境中的导航问题,提出一种基于深度强化学习的区域化视觉导航方法.首先,根据分布式环境特征,在不同区域内独立学习控制策略,同时构建区域化模型, 实现导航过程中控制策略的切换和结合.然后,为使机器人具有更好的目标导向行为,在区域导航子模块中增加奖励预测任务,并结合经验池回放奖励序列.最后,在原有探索策略的基础上添加景深约束,防止因碰撞导致的遍历停滞.结果表明: 奖励预测和景深避障的应用有助于提升导航性能.在多区域环境测试过程中,区域化模型在训练时间和所获奖励上展现出单一模型不具备的优势,表明其能更好地应对大范围导航.此外,实验在第一人称视角的3D环境下进行,状态是部分可观察的,利于实际应用.

关键词: 深度强化学习, 分布式环境, 区域化模型, 奖励预测, 景深避障

Abstract:

Aimed at the problems of navigation in distributed environment of a mobile robot, a regionalization vision navigation method based on deep reinforcement learning is proposed. First, considering the characteristics of distributed environment, the independent submodule learning control strategy is used in different regions and the regionalization model is built to switch and combine navigation control strategies. Then, in order to make the robot have a better goal-oriented behavior, reward prediction task is integrated into the submodule, and reward sequence is played back in combination with the experience pool. Finally, depth limitation is added to the primitive exploration strategy to prevent the traversal stagnation caused by collision. The results show that the application of reward prediction and depth obstacle avoidance is helpful to improve navigation performance. In the process of multi-area environment test, the regionalization model shows the advantages that the single model does not have in terms of training time and rewards, indicating that it can better deal with large-scale navigation. In addition, the experiment is conducted in the first-person 3D environment, and the state is partially observable, which is conducive to practical application.

Key words: deep reinforcement learning, distributed environment, regionalization model, reward prediction, depth obstacle avoidance

中图分类号: