J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (2): 399-416.doi: 10.1007/s12204-023-2615-x

• Automation & Computer Science • Previous Articles    

Efficient Fully Convolutional Network and Optimization Approach for Robotic Grasping Detection Based on RGB-D Images

基于RGB-D图像的机器人抓取检测高效全卷积网络和优化方法

聂卫,梁新武   

  1. School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China
  2. 上海交通大学 航空航天学院,上海200240
  • Accepted:2022-02-22 Online:2025-03-21 Published:2025-03-21

Abstract: Robot grasp detection is a fundamental vision task for robots. Deep learning-based methods have shown excellent results in enhancing the grasp detection capabilities for model-free objects in unstructured scenes. Most popular approaches explore deep network models and exploit RGB-D images combining colour and depth data to acquire enriched feature expressions. However, current work struggles to achieve a satisfactory balance between the accuracy and real-time performance; the variability of RGB and depth feature distributions receives inadequate attention. The treatment of predicted failure cases is also lacking. We propose an efficient fully convolutional network to predict the pixel-level antipodal grasp parameters in RGB-D images. A structure with hierarchical feature fusion is established using multiple lightweight feature extraction blocks. The feature fusion module with 3D global attention is used to select the complementary information in RGB and depth images sufficiently. Additionally, a grasp configuration optimization method based on local grasp path is proposed to cope with the possible failures predicted by the model. Extensive experiments on two public grasping datasets, Cornell and Jacquard, demonstrate that the approach can improve the performance of grasping unknown objects.

Key words: deep learning, object grasping detection, fully convolutional neural network, robot vision

摘要: 机器人抓取检测是机器人的一项基本视觉任务。基于深度学习的方法在提高非结构化场景中无模型物体的抓取检测能力方面取得了优异的成绩。大多数流行的方法旨在探索深度网络模型,并利用RGB-D图像结合颜色和深度数据来获得丰富的特征表示。然而,目前的工作很难在准确性和实时性之间取得令人满意的平衡。RGB和深度特征的差异性没有得到足够的重视,同时,也欠缺对模型预测失败的处理。在本文中,我们提出了一个高效的全卷积网络来预测RGB-D图像中的像素级抓取参数。利用多个轻量级的特征提取模块建立了一个具有层次性的特征融合结构。一个具有三维全局注意力的特征融合模块被用来充分提取RGB和深度图像中的互补信息。此外,为了应对模型预测的可能失败,我们提出了一种基于局部抓取路径的抓取配置优化方法。我们在Cornell和Jacquard两个公共抓取数据集上进行了广泛的实验,证明了我们的方法可以提高抓取未知物体的性能。

关键词: 深度学习,物体抓取检测,全卷积神经网络,机器人视觉

CLC Number: