使用课程学习的动态布料折叠

doi:10.1007/s12204-024-2710-7

摘要/Abstract

摘要： 本文提出了一种新颖的算法，利用强化学习和课程学习方法来训练机械臂操作布料。当前的布料操作算法严重依赖于预定义的动作基元和对布料动力学的假设，需要大量人类的先验知识。为了避免这种限制，提出了一种半稀疏的奖励函数，并结合折叠精度和课程计划，加速训练并改善策略稳定性。通过在StableBaselines3框架中实现，并使用SAC算法智能体在我们实现的物理仿真虚拟环境训练来验证所提出的方法。与传统的领域适应技术相比，结果表明了课程学习方案的优点，突显了我们的方法在推进机器人布料操作任务方面的潜力。

关键词: 智能控制, 强化学习, 课程学习, 机器人

Abstract: This paper presents a novel algorithm for training robotic arms to manipulate cloth, by leveraging reinforcement learning and curriculum learning approaches. Traditional cloth manipulation algorithms rely heavily on predefined action primitives and assumptions about cloth dynamics, introducing significant prior knowledge. To circumvent this limitation, we utilize reinforcement learning to train our cloth folding agent. To fully utilize the advantage of reinforcement learning, we propose a semi-sparse reward function incorporating folding accuracy and a curriculum scheme to accelerate training and improve policy stability. We validate the proposed method by implementing it in the StableBaselines3 framework and training the agent using the soft actor critic algorithm in our virtual environment based on physical-based cloth simulator. Our results demonstrate the benefits of the curriculum learning scheme which increases sample efficiency and accelerates training process compared with previous reinforcement learning cloth manipulation method.

中图分类号:

TP39

. 使用课程学习的动态布料折叠[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 988-997.

LI Mingyang, BAO Hujun, HUANG Jin. Dynamic Cloth Folding Using Curriculum Learning[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 988-997.

参考文献

[1] TORGERSON E, PAUL F W. Vision-guided robotic fabric manipulation for apparel manufacturing [J]. IEEE Control Systems Magazine, 1988, 8(1): 14-20.

[2] WINCK R C, DICKERSON S, BOOK W J, et al. A novel approach to fabric control for automated sewing [C]//2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics. Singapore. IEEE, 2009: 53-58.

[3] SAHARI K S M, SEKI H, KAMIYA Y, et al. Clothes manipulation by robot grippers with roller fingertips [J]. Advanced Robotics, 2010, 24(1/2): 139-158.

[4] MAITIN-SHEPARD J, CUSUMANO-TOWNER M, LEI J N, et al. Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding [C]//2010 IEEE International Conference on Robotics and Automation. Anchorage: IEEE, 2010: 2308-2315.

[5] MILLER S, VAN DEN BERG J, FRITZ M, et al. A geometric approach to robotic laundry folding [J]. International Journal of Robotics Research, 2012, 31(2): 249-267.

[6] JANGIR R, ALENYÀ G, TORRAS C. Dynamic cloth manipulation with deep reinforcement learning [C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 4630-4636.

[7] HIETALA J, BLANCO–MULERO D, ALCAN G, et al. Learning visual feedback control for dynamic cloth folding [C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto: IEEE, 2022: 1455-1462.

[8] BERSCH C, PITZER B, KAMMEL S. Bimanual robotic cloth manipulation for laundry folding [C]//2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. San Francisco: IEEE, 2011: 1413-1419.

[9] TANAKA K, KAMOTANI Y, YOKOKOHJI Y. Origami folding by a robotic hand [C]//2007 IEEE/RSJ International Conference on Intelligent Robots and Systems. San Diego: IEEE, 2007: 2540-2547.

[10] YAMAKAWA Y, NAMIKI A, ISHIKAWA M. Dynamic manipulation of a cloth by high-speed robot system using high-speed visual feedback [J]. IFAC Proceedings Volumes, 2011, 44(1): 8076-8081.

[11] HA H, SONG S R. FlingBot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding [DB/OL]. (2021-05-08). http://arxiv.org/abs/2105.03655

[12] ZHANG H, ICHNOWSKI J, SEITA D, et al. Robots of the lost arc: Self-supervised learning to dynamically manipulate fixed-endpoint cables [C]//2021 IEEE International Conference on Robotics and Automation. Xi’an: IEEE, 2021: 4560-4567.

[13] WANG R, LEHMAN J, CLUNE J, et al. Paired open-ended trailblazer (POET): Endlessly generating increasingly complex and diverse learning environments and their solutions [DB/OL]. (2019-01-07). https://arxiv.org/abs/1901.01753

[14] OpenAI, AKKAYA I, ANDRYCHOWICZ M, et al. Solving rubik’s cube with a robot hand [DB/OL]. (2019-10-16). http://arxiv.org/abs/1910.07113

[15] KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey [J]. Journal of Artificial Intelligence Research, 1996, 4: 237-285.

[16] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [DB/OL]. (2018-01-04). http://arxiv.org/abs/1801.01290

[17] MACREADY W G, WOLPERT D H. Bandit problems and the exploration/exploitation tradeoff [J]. IEEE Transactions on Evolutionary Computation, 1998, 2(1): 2-22.

[18] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//31st Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017: 5048-5058.

[19] SPRAGUE N, BALLARD D. Multiple-goal reinforcement learning with modular Sarsa(0) [C]// 18th International Joint Conference on Artificial Intelligence. Acapulco: ACM, 2003: 1445-1447.

[20] WANG T Y, CHEN J, LI D P, et al. Fast GPU-based two-way continuous collision handling [J]. ACM Transactions on Graphics, 2023, 42(5): 167.

[21] BRIDSON R, FEDKIW R, ANDERSON J. Robust treatment of collisions, contact and friction for cloth animation [J]. ACM Transactions on Graphics, 2002, 21(3): 594-603.

[22] CHEN Z L, FENG R G, WANG H M. Modeling friction and air effects between cloth and deformable bodies [J]. ACM Transactions on Graphics, 2013, 32(4): 88.

[23] LI J, DAVIET G, NARAIN R, et al. An implicit frictional contact solver for adaptive cloth simulation [J]. ACM Transactions on Graphics, 2018, 37(4): 52.

[24] LY M, JOUVE J, BOISSIEUX L, et al. Projective dynamics with dry frictional contact [J]. ACM Transactions on Graphics, 2020, 39(4): 57.

[25] LI Y F, DU T, WU K, et al. DiffCloth: Differentiable cloth simulation with dry frictional contact [J]. ACM Transactions on Graphics, 2022, 42(1): 2.

[26] ZHAO Z, LIU C S, MA W, et al. Experimental investigation of the painlevé paradox in a robotic system [J]. Journal of Applied Mechanics, 2008, 75(4): 1.

[27] MOREAU J J. Unilateral contact and dry friction in finite freedom dynamics [M]// Nonsmooth mechanics and applications. Vienna: Springer, 1988: 1-82.

[28] Hu Y, Wang W, Jia H, et al. Learning to utilize shaping rewards: A new approach of reward shaping[C]// 34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 15931-15941.

[29] GUDIMELLA A, STORY R, SHAKER M, et al. Deep reinforcement learning for dexterous manipulation with concept networks [DB/OL]. (2017-09-20). http://arxiv.org/abs/1709.06977

[30] ZAREMBA W, SUTSKEVER I. Learning to execute [DB/OL]. (2014-10-17). http://arxiv.org/abs/1410.4615

[31] PARDO F, TAVAKOLI A, LEVDIK V, et al. Time limits in reinforcement learning[C]// 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 4045-4054.