Self-Adaptive LSAC-PID Approach Based on Lyapunov Reward Shaping for Mobile Robots

YU Xinyi, XU Siyu, FAN Yuehai, OU Linlin

doi:10.1007/s12204-023-2631-x

Journal of Shanghai Jiaotong University(Science) >

2025 , Vol. 30 >Issue 6: 1085 - 1102

DOI: https://doi.org/10.1007/s12204-023-2631-x

Automation & Computer Technologies

Self-Adaptive LSAC-PID Approach Based on Lyapunov Reward Shaping for Mobile Robots

Expand

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China

Received date: 2021-11-23

Accepted date: 2022-01-27

Online published: 2023-08-04

Fold

Abstract

In order to solve the control problem of multiple-input multiple-output (MIMO) systems in complex and variable control environments, a model-free adaptive LSAC-PID method based on deep reinforcement learning (RL) is proposed in this paper for automatic control of mobile robots. According to the environmental feedback, the RL agent of the upper controller outputs the optimal parameters to the lower MIMO PID controllers, which can realize the real-time PID optimal control. First, a model-free adaptive MIMO PID hybrid control strategy is presented to realize real-time optimal tuning of control parameters in terms of soft-actor-critic (SAC) algorithm, which is state-of-the-art RL algorithm. Second, in order to improve the RL convergence speed and the control performance, a Lyapunov-based reward shaping method for off-policy RL algorithm is designed, and a self-adaptive LSAC-PID tuning approach with Lyapunov-based reward is then determined. Through the policy evaluation and policy improvement of the soft policy iteration, the convergence and optimality of the proposed LSAC-PID algorithm are proved mathematically. Finally, based on the proposed reward shaping method, the reward function is designed to improve the system stability for the line-following robot. The simulation and experiment results show that the proposed adaptive LSAC-PID approach has good control performance such as fast convergence speed, high generalization and high real-time performance, and achieves real-time optimal tuning of MIMO PID parameters without the system model and control loop decoupling.

Cite this article

YU Xinyi, XU Siyu, FAN Yuehai, OU Linlin . Self-Adaptive LSAC-PID Approach Based on Lyapunov Reward Shaping for Mobile Robots[J]. Journal of Shanghai Jiaotong University(Science), 2025 , 30(6) : 1085 -1102 . DOI: 10.1007/s12204-023-2631-x

References

[1] FU Z, CHEN Z P, ZHENG C, et al. A cable-tunnel inspecting robot for dangerous environment [J]. International Journal of Advanced Robotic Systems, 2008, 5(3): 243-248.
[2] NAGATANI K, KIRIBAYASHI S, OKADA Y, et al. Redesign of rescue mobile robot Quince [C]//2011 IEEE International Symposium on Safety, Security, and Rescue Robotics. Kyoto: IEEE, 2011: 13-18.
[3] TUBA E, STRUMBERGER I, ZIVKOVIC D, et al. Mobile robot path planning by improved brain storm optimization algorithm [C]//2018 IEEE Congress on Evolutionary Computation. Rio de Janeiro: IEEE, 2018: 1-8.
[4] WANG Q G, NIE Z Y. PID control for MIMO processes [M]//PID control in the third millennium. London: Springer, 2012: 177-204.
[5] KATEBI R. Robust multivariable tuning methods [M]//PID control in the third millennium. London: Springer, 2012: 255-280.
[6] GIL P, LUCENA C, CARDOSO A, et al. Gain tuning of fuzzy PID controllers for MIMO systems: A performance-driven approach [J]. IEEE Transactions on Fuzzy Systems, 2015, 23(4): 757-767.
[7] BOYD S, HAST M, ASTROM K J. MIMO PID tuning via iterated LMI restriction [J]. International Journal of Robust and Nonlinear Control, 2016, 26(8): 1718-1731.
[8] SONG Y D, HUANG X C, WEN C Y. Robust adaptive fault-tolerant PID control of MIMO nonlinear systems with unknown control direction [J]. IEEE Transactions on Industrial Electronics, 2017, 64(6): 4876-4884.
[9] HOWELL M N, BEST M C. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata [J]. Control Engineering Practice, 2000, 8(2): 147-154.
[10] CARLUCHO I, DE PAULA M, VILLAR S A, et al. Incremental Q-learning strategy for adaptive PID control of mobile robots [J]. Expert Systems With Applications, 2017, 80: 183-199.

[11] CARLUCHO I, DE PAULA M, ACOSTA G G, et al. Double Q-PID algorithm for mobile robot control [J]. Expert Systems With Applications, 2019, 137: 292-307.

[12] KONDA V, TSITSIKLIS J. Actor-critic algorithms [C]//12th International Conference on Neural Information Processing Systems. Denver: NIPS, 1999: 1008-1014.
[13] WANG X S, CHENG Y H, WEI S, et al. A proposal of adaptive PID controller based on reinforcement learning [J]. Journal of China University of Mining and Technology, 2007, 17(1): 40-44.
[14] AKBARIMAJD A. Reinforcement learning adaptive PID controller for an under-actuated robot arm [J]. International Journal of Integrated Engineering, 2015, 7(2): 20-27.
[15] CARLUCHO I, DE PAULA M, ACOSTA G G, et al. An adaptive deep reinforcement learning approach for MIMO PID control of mobile robots [J]. ISA Transactions, 2020, 102: 280-294.
[16] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C]//35th International Conference on Machine Learning. Stockholm: IMLS, 2018: 1861-1870.
[17] YU X Y, FAN Y H, XU S Y, et al. A self-adaptive SACPID control approach based on reinforcement learning for mobile robots [J]. International Journal of Robust and Nonlinear Control, 2022, 32(18): 9625-9643.
[18] WEISZ G, BUDZIANOWSKI P, SU P H, et al. Sample efficient deep reinforcement learning for dialogue systems with large action spaces [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(11): 2083-2097.
[19] YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]//34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 1-12.
[20] YE F, CHENG X X, WANG P, et al. Automated lane change strategy using proximal policy optimizationbased deep reinforcement learning [C]//2020 IEEE Intelligent Vehicles Symposium. Las Vegas: IEEE, 2020: 1746-1752.
[21] NG A Y, HARADA D, RUSSELL S. Policy invariance under reward transformations: Theory and application to reward shaping [C]//16th International Conference on Machine Learning. Bled: IMLS, 1999: 278-287.
[22] ASMUTH J, LITTMAN M L, ZINKOV R. Potentialbased shaping in model-based reinforcement learning [C]//23rd AAAI Conference on Artificial Intelligence. Chicago: AAAI, 2008: 604-609.
[23] DEVLIN S M, KUDENKO D. Dynamic potentialbased reward shaping [C]//11th International Conference on Autonomous Agents and Multiagent Systems. Valenci: IFAAMAS, 2012: 433-440.
[24] DEVLIN S, YLINIEMI L, KUDENKO D, et al. Potential-based difference rewards for multiagent reinforcement learning [C]//13th International Conference on Autonomous Agents and Multi-Agent Systems. Paris: IFAAMAS, 2014: 165-172.
[25] WIEWIORA E, COTTRELL G W, ELKAN C. Principled methods for advising reinforcement learning agents [C]//20th International Conference on Machine Learning. Washington: IMLS, 2003: 792-799.
[26] HARUTYUNYAN A, DEVLIN S, VRANCX P, et al. Expressing arbitrary reward functions as potentialbased advice [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2015, 29(1): 2652-2658.
[27] BRYS T, HARUTYUNYAN A, SUAY H B, et al. Reinforcement learning from demonstration through shaping [C]//24th International Conference on Artificial Intelligence. Buenos Aires: AAAI, 2015: 3352-3358.
[28] DONG Y L, et al. Principled reward shaping for reinforcement learning via Lyapunov stability theory [J]. Neurocomputing, 2020, 393: 83-90.
[29] SUTTON R S, BARTO A G. Reinforcement learning: An introduction [M]. 2nd ed. Cambridge: MIT Press, 2018.
[30] BALTES J, LIN Y M. Path tracking control of nonholonomic car-like robot with reinforcement learning [M]//RoboCup-99: Robot Soccer World Cup III. Berlin, Heidelberg: Springer, 1999: 162-173.
[31] SAADATMAND S, AZIZI S, KAVOUSI M, et al. Autonomous control of a line follower robot using a Qlearning controller [C]//2020 10th Annual Computing and Communication Workshop and Conference. Las Vegas: IEEE, 2020: 556-561.
[32] MARTINSEN A B, LEKKAS A M. Curved path following with deep reinforcement learning: Results from three vessel models [C]//OCEANS 2018 MTS/IEEE Charleston. Charleston: IEEE, 2018: 1-8.
[33] KUMAR V, NAKRA B C, MITTAL A P. A review on classical and fuzzy PID controllers [J]. International Journal of Intelligent Control and Systems, 2011, 16(3): 170-181.
[34] XU Q, KAN J M, CHEN S N, et al. Fuzzy PID based trajectory tracking control of mobile robot and its simulation in simulink [J]. International Journal of Control and Automation, 2014, 7(8): 233-244.
[35] TIEP D K, LEE K, IM D Y, et al. Design of fuzzy-PID controller for path tracking of mobile robot with differential drive [J]. International Journal of Fuzzy Logic and Intelligent Systems, 2018, 18(3): 220-228.
[36] FURUKAWA S, KONDO S, TAKANISHI A, et al. Radial basis function neural network based PID control for quad-rotor flying robot [C]//2017 17th International Conference on Control, Automation and Systems. Jeju: IEEE, 2017: 580-584.
[37] LIU Q, LI D, GE S S, et al. Adaptive bias RBF neural network control for a robotic manipulator [J]. Neurocomputing, 2021, 447: 213-223.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References