J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (5): 988-997.doi: 10.1007/s12204-024-2710-7
收稿日期:
2023-08-09
接受日期:
2023-08-30
出版日期:
2025-09-26
发布日期:
2024-02-20
李铭扬,鲍虎军,黄劲
Received:
2023-08-09
Accepted:
2023-08-30
Online:
2025-09-26
Published:
2024-02-20
摘要: 本文提出了一种新颖的算法,利用强化学习和课程学习方法来训练机械臂操作布料。当前的布料操作算法严重依赖于预定义的动作基元和对布料动力学的假设,需要大量人类的先验知识。为了避免这种限制,提出了一种半稀疏的奖励函数,并结合折叠精度和课程计划,加速训练并改善策略稳定性。通过在StableBaselines3框架中实现,并使用SAC算法智能体在我们实现的物理仿真虚拟环境训练来验证所提出的方法。与传统的领域适应技术相比,结果表明了课程学习方案的优点,突显了我们的方法在推进机器人布料操作任务方面的潜力。
中图分类号:
. 使用课程学习的动态布料折叠[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 988-997.
LI Mingyang, BAO Hujun, HUANG Jin. Dynamic Cloth Folding Using Curriculum Learning[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 988-997.
[1] TORGERSON E, PAUL F W. Vision-guided robotic fabric manipulation for apparel manufacturing [J]. IEEE Control Systems Magazine, 1988, 8(1): 14-20. [2] WINCK R C, DICKERSON S, BOOK W J, et al. A novel approach to fabric control for automated sewing [C]//2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics. Singapore. IEEE, 2009: 53-58. [3] SAHARI K S M, SEKI H, KAMIYA Y, et al. Clothes manipulation by robot grippers with roller fingertips [J]. Advanced Robotics, 2010, 24(1/2): 139-158. [4] MAITIN-SHEPARD J, CUSUMANO-TOWNER M, LEI J N, et al. Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding [C]//2010 IEEE International Conference on Robotics and Automation. Anchorage: IEEE, 2010: 2308-2315. [5] MILLER S, VAN DEN BERG J, FRITZ M, et al. A geometric approach to robotic laundry folding [J]. International Journal of Robotics Research, 2012, 31(2): 249-267. [6] JANGIR R, ALENYÀ G, TORRAS C. Dynamic cloth manipulation with deep reinforcement learning [C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 4630-4636. [7] HIETALA J, BLANCO–MULERO D, ALCAN G, et al. Learning visual feedback control for dynamic cloth folding [C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto: IEEE, 2022: 1455-1462. [8] BERSCH C, PITZER B, KAMMEL S. Bimanual robotic cloth manipulation for laundry folding [C]//2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. San Francisco: IEEE, 2011: 1413-1419. [9] TANAKA K, KAMOTANI Y, YOKOKOHJI Y. Origami folding by a robotic hand [C]//2007 IEEE/RSJ International Conference on Intelligent Robots and Systems. San Diego: IEEE, 2007: 2540-2547. [10] YAMAKAWA Y, NAMIKI A, ISHIKAWA M. Dynamic manipulation of a cloth by high-speed robot system using high-speed visual feedback [J]. IFAC Proceedings Volumes, 2011, 44(1): 8076-8081. [11] HA H, SONG S R. FlingBot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding [DB/OL]. (2021-05-08). http://arxiv.org/abs/2105.03655 [12] ZHANG H, ICHNOWSKI J, SEITA D, et al. Robots of the lost arc: Self-supervised learning to dynamically manipulate fixed-endpoint cables [C]//2021 IEEE International Conference on Robotics and Automation. Xi’an: IEEE, 2021: 4560-4567. [13] WANG R, LEHMAN J, CLUNE J, et al. Paired open-ended trailblazer (POET): Endlessly generating increasingly complex and diverse learning environments and their solutions [DB/OL]. (2019-01-07). https://arxiv.org/abs/1901.01753 [14] OpenAI, AKKAYA I, ANDRYCHOWICZ M, et al. Solving rubik’s cube with a robot hand [DB/OL]. (2019-10-16). http://arxiv.org/abs/1910.07113 [15] KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey [J]. Journal of Artificial Intelligence Research, 1996, 4: 237-285. [16] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [DB/OL]. (2018-01-04). http://arxiv.org/abs/1801.01290 [17] MACREADY W G, WOLPERT D H. Bandit problems and the exploration/exploitation tradeoff [J]. IEEE Transactions on Evolutionary Computation, 1998, 2(1): 2-22. [18] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//31st Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017: 5048-5058. [19] SPRAGUE N, BALLARD D. Multiple-goal reinforcement learning with modular Sarsa(0) [C]// 18th International Joint Conference on Artificial Intelligence. Acapulco: ACM, 2003: 1445-1447. [20] WANG T Y, CHEN J, LI D P, et al. Fast GPU-based two-way continuous collision handling [J]. ACM Transactions on Graphics, 2023, 42(5): 167. [21] BRIDSON R, FEDKIW R, ANDERSON J. Robust treatment of collisions, contact and friction for cloth animation [J]. ACM Transactions on Graphics, 2002, 21(3): 594-603. [22] CHEN Z L, FENG R G, WANG H M. Modeling friction and air effects between cloth and deformable bodies [J]. ACM Transactions on Graphics, 2013, 32(4): 88. [23] LI J, DAVIET G, NARAIN R, et al. An implicit frictional contact solver for adaptive cloth simulation [J]. ACM Transactions on Graphics, 2018, 37(4): 52. [24] LY M, JOUVE J, BOISSIEUX L, et al. Projective dynamics with dry frictional contact [J]. ACM Transactions on Graphics, 2020, 39(4): 57. [25] LI Y F, DU T, WU K, et al. DiffCloth: Differentiable cloth simulation with dry frictional contact [J]. ACM Transactions on Graphics, 2022, 42(1): 2. [26] ZHAO Z, LIU C S, MA W, et al. Experimental investigation of the painlevé paradox in a robotic system [J]. Journal of Applied Mechanics, 2008, 75(4): 1. [27] MOREAU J J. Unilateral contact and dry friction in finite freedom dynamics [M]// Nonsmooth mechanics and applications. Vienna: Springer, 1988: 1-82. [28] Hu Y, Wang W, Jia H, et al. Learning to utilize shaping rewards: A new approach of reward shaping[C]// 34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 15931-15941. [29] GUDIMELLA A, STORY R, SHAKER M, et al. Deep reinforcement learning for dexterous manipulation with concept networks [DB/OL]. (2017-09-20). http://arxiv.org/abs/1709.06977 [30] ZAREMBA W, SUTSKEVER I. Learning to execute [DB/OL]. (2014-10-17). http://arxiv.org/abs/1410.4615 [31] PARDO F, TAVAKOLI A, LEVDIK V, et al. Time limits in reinforcement learning[C]// 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 4045-4054. |
[1] | 严赫, 朱星月, 侯张俪, 王卫军, 张执南. 小型月面跳跃机器人设计及运动建模[J]. 上海交通大学学报, 2025, 59(8): 1169-1180. |
[2] | 陈实, 杨林森, 刘艺洪, 罗欢, 臧天磊, 周步祥. 小样本数据驱动模式下的新建微电网优化调度策略[J]. 上海交通大学学报, 2025, 59(6): 732-745. |
[3] | . 适用于机器人辅助微创手术的六自由度主手设计[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 658-667. |
[4] | . 适用于新型胃肠道胶囊机器人的二维矩形螺线管发射线圈设计与实验[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 825-832. |
[5] | 张婉滢, 司马珂, 张育禾, 孟健, 杨振, 周德云. 基于近端策略优化的多弹协同围捕机动目标制导控制方法[J]. 空天防御, 2025, 8(4): 94-103. |
[6] | 王志博, 呼卫军, 马先龙, 全家乐, 周皓宇. 感知驱动控制的无人机拦截碰撞技术[J]. 空天防御, 2025, 8(4): 78-84. |
[7] | 赵莹莹, 仇越, 朱天晨, 李凡, 苏运, 邰振赢, 孙庆赟, 凡航. 基于分层强化学习的新型电力系统在线稳态调度[J]. 上海交通大学学报, 2025, 59(3): 400-412. |
[8] | 杜君南, 帅逸仙, 陈顶, 汪敏, 周金鹏. 基于约束强化学习的海上编队探测节点协同部署算法[J]. 空天防御, 2025, 8(3): 95-103. |
[9] | 李奕佳, 李嘉诺, 柯良军. 基于强化学习的无人机协作防守策略设计与验证[J]. 空天防御, 2025, 8(3): 73-85. |
[10] | 周文杰, 付昱龙, 郭相科, 戚玉涛, 张海宾. 基于博弈树与数字平行战场的空战决策方法[J]. 空天防御, 2025, 8(3): 50-58. |
[11] | . 近红外胶囊机器人无线能量接收线圈优化设计[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 425-432. |
[12] | . 基于RGB-D图像的机器人抓取检测高效全卷积网络和优化方法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 399-416. |
[13] | 薛雅丽, 徐夏易, 李锦毅, 崔闪, 洪君, 刘世豪. 智能控制技术在导弹制导系统中的应用与发展前景[J]. 空天防御, 2025, 8(2): 1-6. |
[14] | . 多机协调吊运系统的绳索矢量碰撞检测算法研究[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 319-329. |
[15] | 赵艳飞1,2,3, 肖鹏4, 王景川1,2,3, 郭锐4. 基于局部语义地图的移动机器人半自主导航[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 27-33. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||