Automation & Computer Science

Efficient Fully Convolutional Network and Optimization Approach for Robotic Grasping Detection Based on RGB-D Images

Expand
  • School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China

Accepted date: 2022-02-22

  Online published: 2025-03-21

Abstract

Robot grasp detection is a fundamental vision task for robots. Deep learning-based methods have shown excellent results in enhancing the grasp detection capabilities for model-free objects in unstructured scenes. Most popular approaches explore deep network models and exploit RGB-D images combining colour and depth data to acquire enriched feature expressions. However, current work struggles to achieve a satisfactory balance between the accuracy and real-time performance; the variability of RGB and depth feature distributions receives inadequate attention. The treatment of predicted failure cases is also lacking. We propose an efficient fully convolutional network to predict the pixel-level antipodal grasp parameters in RGB-D images. A structure with hierarchical feature fusion is established using multiple lightweight feature extraction blocks. The feature fusion module with 3D global attention is used to select the complementary information in RGB and depth images sufficiently. Additionally, a grasp configuration optimization method based on local grasp path is proposed to cope with the possible failures predicted by the model. Extensive experiments on two public grasping datasets, Cornell and Jacquard, demonstrate that the approach can improve the performance of grasping unknown objects.

Cite this article

Nie Wei, Liang Xinwu . Efficient Fully Convolutional Network and Optimization Approach for Robotic Grasping Detection Based on RGB-D Images[J]. Journal of Shanghai Jiaotong University(Science), 2025 , 30(2) : 399 -416 . DOI: 10.1007/s12204-023-2615-x

References

[1] PAOLINI R, RODRIGUEZ A, SRINIVASA S S, et al. A data-driven statistical framework for post-grasp manipulation [J]. International Journal of Robotics Research, 2014, 33(4): 600-615.
[2] LIN Y, SUN Y. Robot grasp planning based on demonstrated grasp strategies [J]. International Journal of Robotics Research, 2015, 34(1): 26-42.
[3] SUI Z Q, XIANG L Z, JENKINS O C, et al. Goaldirected robot manipulation through axiomatic scene estimation [J]. The International Journal of Robotics Research, 2017, 36(1): 86-104.
[4] NGUYEN V D. Constructing force-closure grasps [C]// 1986 IEEE International Conference on Robotics and Automation. San Francisco: IEEE, 1986: 1368- 1373.
[5] FERRARI C, CANNY J. Planning optimal grasps [C]//Proceedings 1992 IEEE International Conference on Robotics and Automation. Nice: IEEE, 1992: 2290- 2295.
[6] KAMON I, FLASH T, EDELMAN S. Learning to grasp using visual information [C]//Proceedings of IEEE International Conference on Robotics and Automation. Minneapolis: IEEE, 1996: 2470-2476.
[7] MILLER A T, ALLEN P K. Graspit! A versatile simulator for robotic grasping [J]. IEEE Robotics & Automation Magazine, 2004, 11(4): 110-122.
[8] ABELHA P, GUERIN F, SCHOELER M. A modelbased approach to finding substitute tools in 3D vision data [C]//2016 IEEE International Conference on Robotics and Automation. Stockholm: IEEE, 2016: 2471-2478.
[9] TEN PAS A, GUALTIERI M, SAENKO K, et al. Grasp pose detection in point clouds [J]. International Journal of Robotics Research, 2017, 36(13/14): 1455- 1473.
[10] TEEPLE C B, KOUTROS T N, GRAULE M A, et al. Multi-segment soft robotic fingers enable robust precision grasping [J]. The International Journal of Robotics Research, 2020, 39(14): 1647-1667.
[11] KYOTA F, WATABE T, SAITO S, et al. Detection and evaluation of grasping positions for autonomous agents [C]//2005 International Conference on Cyberworlds. Singapore: IEEE, 2005: 1-8.
[12] CALDERA S, RASSAU A, CHAI D. Review of deep learning methods in robotic grasp detection [J]. Multimodal Technologies and Interaction, 2018, 2(3): 57.
[13] SALISBURY J K, CRAIG J J. Articulated hands [J]. The International Journal of Robotics Research, 1982, 1(1): 4-17.
[14] DIZIOˇ GLU B, LAKSHIMINARAYANA K. Mechanics of form closure [J]. Acta Mechanica, 1984, 52(1): 107- 118.
[15] BOHG J, MORALES A, ASFOUR T, et al. Datadriven grasp synthesis—a survey [J]. IEEE Transactions on Robotics, 2014, 30(2): 289-309.
[16] SAXENA A, DRIEMEYER J, NG A Y. Robotic grasping of novel objects using vision [J]. The International Journal of Robotics Research, 2008, 27(2): 157-173.
[17] LE Q V, KAMM D, KARA A F, et al. Learning to grasp objects with multiple contact points [C]//2010 IEEE International Conference on Robotics and Automation. Anchorage: IEEE, 2010: 5062-5069.
[18] KAPPLER D, BOHG J, SCHAAL S. Leveraging big data for grasp planning [C]//2015 IEEE International Conference on Robotics and Automation. Seattle: IEEE, 2015: 4304-4311.
[19] SAXENA A, WONG L L S, NG A Y. Learning grasp strategies with partial shape information [C]// 23rd International Conference on Artificial Intelligence. Chicago: ACM, 2008: 1491-1494.
[20] SICILIANO B, KHATIB O. Springer handbook of robotics [M]. Cham: Springer International Publishing, 2016.
[21] JIANG Y, MOSESON S, SAXENA A. Efficient grasping from RGBD images: Learning using a new rectangle representation [C]//2011 IEEE International Conference on Robotics and Automation. Shanghai: IEEE, 2011: 3304-3311.
[22] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
[23] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
[24] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269.
[25] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[26] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [M]//European conference on computer vision. Cham: Springer, 2018: 3-19.
[27] LENZ I, LEE H, SAXENA A. Deep learning for detecting robotic grasps [J]. International Journal of Robotics Research, 2015, 34(4/5): 705-724.
[28] REDMON J, ANGELOVA A. Real-time grasp detection using convolutional neural networks [C]//2015 IEEE International Conference on Robotics and Automation. Seattle: IEEE, 2015: 1316-1322.
[29] PINTO L, GUPTA A. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours [C]//2016 IEEE International Conference on Robotics and Automation. Stockholm: IEEE, 2016: 3406-3413.
[30] DEPIERRE A, DELLANDR′EA E, CHEN L M. Jacquard: A large scale dataset for robotic grasp detection [C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: IEEE, 2018: 3511-3516.
[31] WANG Z C, LI Z Q, WANG B, et al. Robot grasp detection using multimodal deep convolutional neural networks [J]. Advances in Mechanical Engineering, 2016, 8(9): 168781401666807.
[32] KUMRA S, KANAN C. Robotic grasp detection using deep convolutional neural networks [C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver: IEEE, 2017: 769-776.
[33] MAHLER J, LIANG J, NIYAZ S, et al. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics [C]//Robotics: Science and Systems XIII. Cambridge: MIT Press, 2017: p58.
[34] GUO D, SUN F C, LIU H P, et al. A hybrid deep architecture for robotic grasp detection [C]//2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE, 2017: 1609-1614.
[35] CHU F J, XU R N, VELA P A. Real-world multiobject, multigrasp detection [J]. IEEE Robotics and Automation Letters, 2018, 3(4): 3355-3362.
[36] ASIF U, TANG J B, HARRER S. GraspNet: an efficient convolutional neural network for real-time grasp detection for low-powered devices [C]// Twenty- Seventh International Joint Conference on Artificial Intelligence. Stockholm: International Joint Conferences on Artificial Intelligence Organization, 2018: 4875-4882.
[37] ZHOU X W, LAN X G, ZHANG H B, et al. Fully convolutional grasp detection network with oriented anchor box [C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: IEEE, 2018: 7223-7230.
[38] ZHANG H B, LAN X G, BAI S T, et al. ROI-based robotic grasp detection for object overlapping scenes [C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau: IEEE, 2019: 4768-4775.
[39] SONG Y N, GAO L, LI X Y, et al. A novel robotic grasp detection method based on region proposal networks [J]. Robotics and Computer-Integrated Manufacturing, 2020, 65: 101963.
[40] MORRISON D, CORKE P, LEITNER J. Learning robust, real-time, reactive robotic grasping [J]. The International Journal of Robotics Research, 2020, 39(2/3): 183-201.
[41] KUMRA S, JOSHI S, SAHIN F. Antipodal robotic grasping using generative residual convolutional neural network [C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas: IEEE, 2020: 9626-9633.
[42] WANG D X. SGDN: Segmentation-based grasp detection network for unsymmetrical three-finger gripper [DB/OL]. (2020-05-17). https://arxiv.org/ abs/2005.08222

[43] ASIF U, TANG J B, HARRER S. Densely supervised grasp detector (DSGD) [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8085-8093.
[44] CHALVATZAKI G, GKANATSIOS N, MARAGOS P, et al. Orientation attentive robotic grasp synthesis with augmented grasp map representation [DB/OL]. (2020-06-09). https://arxiv.org/abs/2006.05123

[45] DOLEZEL P, STURSA D, KOPECKY D, et al. Memory efficient grasping point detection of nontrivial objects [J]. IEEE Access, 2021, 9: 82130-82145.
[46] CAO H, CHEN G, LI Z J, et al. Lightweight convolutional neural network with Gaussian-based grasping representation for robotic grasping detection [DB/OL]. (2021-01-25). https://arxiv.org/abs/2101.10226 [47] WU G B, CHEN W S, CHENG H, et al. Multi-object grasping detection with hierarchical feature fusion [J]. IEEE Access, 2019, 7: 43884-43894.
[48] YU Y Y, CAO Z Q, LIU Z C, et al. A two-stream CNN with simultaneous detection and segmentation for robotic grasping [J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(2): 1167-1181.
[49] HE Y H, ZHANG X Y, SUN J. Channel pruning for accelerating very deep neural networks [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 1398-1406.
[50] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [DB/OL]. (2015-05-09). https://arxiv.org/abs/1503.02531

[51] CHOUKROUN Y, KRAVCHIK E, YANG F, et al. Low-bit quantization of neural networks for efficient inference [C]//2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul: IEEE, 2019: 3009-3018.
[52] HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1577-1586.
[53] IHRKE I, KUTULAKOS K N, LENSCH H P A, et al. Transparent and specular object reconstruction [J]. Computer Graphics Forum, 2010, 29(8): 2400-2426.
[54] YANG L, ZHANG R Y, LI L, et al. Simam: A simple, parameter-free attention module for convolutional neural networks [C]//International Conference on Machine Learning. Copenhagen: IMLS, 2021: 11863-11874.

Outlines

/