Journal of Shanghai Jiaotong University(Science) >
Self-Adaptive LSAC-PID Approach Based on Lyapunov Reward Shaping for Mobile Robots
Received date: 2021-11-23
Accepted date: 2022-01-27
Online published: 2023-08-04
YU Xinyi, XU Siyu, FAN Yuehai, OU Linlin . Self-Adaptive LSAC-PID Approach Based on Lyapunov Reward Shaping for Mobile Robots[J]. Journal of Shanghai Jiaotong University(Science), 2025 , 30(6) : 1085 -1102 . DOI: 10.1007/s12204-023-2631-x
[11] CARLUCHO I, DE PAULA M, ACOSTA G G, et al. Double Q-PID algorithm for mobile robot control [J]. Expert Systems With Applications, 2019, 137: 292-307.
[12] KONDA V, TSITSIKLIS J. Actor-critic algorithms [C]//12th International Conference on Neural Information Processing Systems. Denver: NIPS, 1999: 1008-1014.
[13] WANG X S, CHENG Y H, WEI S, et al. A proposal of adaptive PID controller based on reinforcement learning [J]. Journal of China University of Mining and Technology, 2007, 17(1): 40-44.
[14] AKBARIMAJD A. Reinforcement learning adaptive PID controller for an under-actuated robot arm [J]. International Journal of Integrated Engineering, 2015, 7(2): 20-27.
[15] CARLUCHO I, DE PAULA M, ACOSTA G G, et al. An adaptive deep reinforcement learning approach for MIMO PID control of mobile robots [J]. ISA Transactions, 2020, 102: 280-294.
[16] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C]//35th International Conference on Machine Learning. Stockholm: IMLS, 2018: 1861-1870.
[17] YU X Y, FAN Y H, XU S Y, et al. A self-adaptive SACPID control approach based on reinforcement learning for mobile robots [J]. International Journal of Robust and Nonlinear Control, 2022, 32(18): 9625-9643.
[18] WEISZ G, BUDZIANOWSKI P, SU P H, et al. Sample efficient deep reinforcement learning for dialogue systems with large action spaces [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(11): 2083-2097.
[19] YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]//34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 1-12.
[20] YE F, CHENG X X, WANG P, et al. Automated lane change strategy using proximal policy optimizationbased deep reinforcement learning [C]//2020 IEEE Intelligent Vehicles Symposium. Las Vegas: IEEE, 2020: 1746-1752.
[21] NG A Y, HARADA D, RUSSELL S. Policy invariance under reward transformations: Theory and application to reward shaping [C]//16th International Conference on Machine Learning. Bled: IMLS, 1999: 278-287.
[22] ASMUTH J, LITTMAN M L, ZINKOV R. Potentialbased shaping in model-based reinforcement learning [C]//23rd AAAI Conference on Artificial Intelligence. Chicago: AAAI, 2008: 604-609.
[23] DEVLIN S M, KUDENKO D. Dynamic potentialbased reward shaping [C]//11th International Conference on Autonomous Agents and Multiagent Systems. Valenci: IFAAMAS, 2012: 433-440.
[24] DEVLIN S, YLINIEMI L, KUDENKO D, et al. Potential-based difference rewards for multiagent reinforcement learning [C]//13th International Conference on Autonomous Agents and Multi-Agent Systems. Paris: IFAAMAS, 2014: 165-172.
[25] WIEWIORA E, COTTRELL G W, ELKAN C. Principled methods for advising reinforcement learning agents [C]//20th International Conference on Machine Learning. Washington: IMLS, 2003: 792-799.
[26] HARUTYUNYAN A, DEVLIN S, VRANCX P, et al. Expressing arbitrary reward functions as potentialbased advice [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2015, 29(1): 2652-2658.
[27] BRYS T, HARUTYUNYAN A, SUAY H B, et al. Reinforcement learning from demonstration through shaping [C]//24th International Conference on Artificial Intelligence. Buenos Aires: AAAI, 2015: 3352-3358.
[28] DONG Y L, et al. Principled reward shaping for reinforcement learning via Lyapunov stability theory [J]. Neurocomputing, 2020, 393: 83-90.
[29] SUTTON R S, BARTO A G. Reinforcement learning: An introduction [M]. 2nd ed. Cambridge: MIT Press, 2018.
[30] BALTES J, LIN Y M. Path tracking control of nonholonomic car-like robot with reinforcement learning [M]//RoboCup-99: Robot Soccer World Cup III. Berlin, Heidelberg: Springer, 1999: 162-173.
[31] SAADATMAND S, AZIZI S, KAVOUSI M, et al. Autonomous control of a line follower robot using a Qlearning controller [C]//2020 10th Annual Computing and Communication Workshop and Conference. Las Vegas: IEEE, 2020: 556-561.
[32] MARTINSEN A B, LEKKAS A M. Curved path following with deep reinforcement learning: Results from three vessel models [C]//OCEANS 2018 MTS/IEEE Charleston. Charleston: IEEE, 2018: 1-8.
[33] KUMAR V, NAKRA B C, MITTAL A P. A review on classical and fuzzy PID controllers [J]. International Journal of Intelligent Control and Systems, 2011, 16(3): 170-181.
[34] XU Q, KAN J M, CHEN S N, et al. Fuzzy PID based trajectory tracking control of mobile robot and its simulation in simulink [J]. International Journal of Control and Automation, 2014, 7(8): 233-244.
[35] TIEP D K, LEE K, IM D Y, et al. Design of fuzzy-PID controller for path tracking of mobile robot with differential drive [J]. International Journal of Fuzzy Logic and Intelligent Systems, 2018, 18(3): 220-228.
[36] FURUKAWA S, KONDO S, TAKANISHI A, et al. Radial basis function neural network based PID control for quad-rotor flying robot [C]//2017 17th International Conference on Control, Automation and Systems. Jeju: IEEE, 2017: 580-584.
[37] LIU Q, LI D, GE S S, et al. Adaptive bias RBF neural network control for a robotic manipulator [J]. Neurocomputing, 2021, 447: 213-223.
/
| 〈 |
|
〉 |