Journal of Shanghai Jiao Tong University ›› 2025, Vol. 59 ›› Issue (3): 400-412.doi: 10.16183/j.cnki.jsjtu.2023.344
• New Type Power System and the Integrated Energy • Previous Articles Next Articles
ZHAO Yingying1,2, QIU Yue3, ZHU Tianchen3(), LI Fan1,2, SU Yun1,2, TAI Zhenying3, SUN Qingyun3, FAN Hang4
Received:
2023-07-24
Revised:
2023-09-26
Accepted:
2023-11-22
Online:
2025-03-28
Published:
2025-04-02
CLC Number:
ZHAO Yingying, QIU Yue, ZHU Tianchen, LI Fan, SU Yun, TAI Zhenying, SUN Qingyun, FAN Hang. Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning[J]. Journal of Shanghai Jiao Tong University, 2025, 59(3): 400-412.
Tab.1
Hyper-parameters of model
参数名 | 参数含义 | 参数取值 |
---|---|---|
lr_actor | Actor模型初始学习率 | 1×10-5 |
lr_critic | Critic模型初始学习率 | 1×10-3 |
max_episode | 训练总回合数 | 2×105 |
batch_size | 每批次训练样本大小 | 1 024 |
gradient_clip | 梯度裁剪上限 | 1.0 |
init_action_std | 动作随机探索噪声初始标准差 | 0.3 |
active_function | 模型激活函数 | Tanh |
mlp_num_layers | Actor和Critic隐藏层数目 | 3 |
history_state_len | 历史信息序列长度 | 25 |
gru_num_layers | GRU模型结构层数 | 2 |
gru_hidden_size | GRU模型隐藏层维度 | 64 |
gcn_hidden_size | GCN模型隐藏层维度 | 32 |
gcn_dropout | GCN模型舍弃率 | 0.1 |
Tab.2
Evaluation performance in all test cases(mean±variance)
算例 | 算法 | xscore | xround |
---|---|---|---|
IEEE-118 | Random | -14.09±8.21 | 21.48±12.88 |
DDPG | 413.65±114.00 | 844.82±192.19 | |
TD3 | 497.57±65.75 | 919.82±89.09 | |
A2C | 5.95±1.48 | 58.20±3.46 | |
PPO | 5.68±1.39 | 56.34±3.06 | |
StarHeart | 1327.24±103.59 | 2229.83±186.79 | |
L2RPN-WCCI-2022 | Random | -8.33±6.12 | 20.22±5.84 |
DDPG | 58.22±16.97 | 126.32±25.17 | |
TD3 | 46.51±11.35 | 100.96±19.60 | |
A2C | 5.43±1.71 | 40.07±2.52 | |
PPO | 6.46±3.23 | 39.71±2.33 | |
StarHeart | 76.56±8.31 | 223.66±15.20 | |
SG-126 | Random | 19.94±1.06 | 30.34±1.89 |
DDPG | 109.38±13.14 | 141.27±16.98 | |
TD3 | 251.59±27.26 | 371.75±34.36 | |
A2C | 263.69±21.29 | 573.17±59.44 | |
PPO | 150.36±44.69 | 262.03±72.14 | |
StarHeart | 684.30±60.16 | 783.80±79.15 |
[1] | 王继业. 人工智能赋能源网荷储协同互动的应用及展望[J]. 中国电机工程学报, 2022, 42(21): 7667-7681. |
WANG Jiye. Application and prospect of source-grid-load-storage coordination enabled by artificial intelligence[J]. Proceedings of the CSEE, 2022, 42(21): 7667-7681. | |
[2] |
叶志亮, 黎灿兵, 张勇军, 等. 含高比例气象敏感可再生能源电网日前调度时间颗粒度优化[J]. 上海交通大学学报, 2023, 57(7): 781-790.
doi: 10.16183/j.cnki.jsjtu.2022.277 |
YE Zhiliang, LI Canbing, ZHANG Yongjun, et al. Optimization of day-ahead dispatch time resolution in power system with a high proportion of climate-sensitive renewable energy sources[J]. Journal of Shanghai Jiao Tong University, 2023, 57(7): 781-790. | |
[3] | RIFFONNEAU Y, BACHA S, BARRUEL F, et al. Optimal power flow management for grid connected PV systems with batteries[J]. IEEE Transactions on Sustainable Energy, 2011, 2(3): 309-320. |
[4] | AN L N, QUOC-TUAN T. Optimal energy management for grid connected microgrid by using dynamic programming method[C]//2015 IEEE Power & Energy Society General Meeting. Denver, USA: IEEE, 2015: 1-5. |
[5] | 李鹏, 王加浩, 黎灿兵, 等. 计及源荷不确定性与设备变工况特性的园区综合能源系统协同优化运行方法[J]. 中国电机工程学报, 2023, 43(20): 7802-7811. |
LI Peng, WANG Jiahao, LI Canbing, et al. Collaborative optimal scheduling of the community integrated energy system considering source-load uncertainty and equipment off-design performance[J]. Proceedings of the CSEE, 2023, 43(20): 7802-7811. | |
[6] | GUO Y F, WU Q W, GAO H L, et al. Double-time-scale coordinated voltage control in active distribution networks based on MPC[J]. IEEE Transactions on Sustainable Energy, 2020, 11(1): 294-303. |
[7] |
陈雨婷, 赵毅, 吴俊达, 等. 考虑碳排放指标的配电网经济调度方法[J]. 上海交通大学学报, 2023, 57(4): 442-451.
doi: 10.16183/j.cnki.jsjtu.2021.482 |
CHEN Yuting, ZHAO Yi, WU Junda, et al. Economic dispatch method of distribution network considering carbon emission index[J]. Journal of Shanghai Jiao Tong University, 2023, 57(4): 442-451. | |
[8] | 戚艳, 尚学军, 聂靖宇, 等. 基于改进多目标灰狼算法的冷热电联供型微电网运行优化[J]. 电测与仪表, 2022, 59(6): 12-19. |
QI Yan, SHANG Xuejun, NIE Jingyu, et al. Optimization of CCHP micro-grid operation based on improved multi-objective grey wolf algorithm[J]. Electrical Measurement & Instrumentation, 2022, 59(6): 12-19. | |
[9] | 刘新苗, 李卓环, 曾凯文, 等. 基于集群负荷预测的主动配电网多目标优化调度[J]. 电测与仪表, 2021, 58(5): 98-104. |
LIU Xinmiao, LI Zhuohuan, ZENG Kaiwen, et al. Multi-objective optimal dispatching of active distribution network based on cluster load prediction[J]. Electrical Measurement & Instrumentation, 2021, 58(5): 98-104. | |
[10] | HIJJO M, FELGNER F, FREY G. PV-Battery-Diesel microgrid layout design based on stochastic optimization[C]//2017 6th International Conference on Clean Electrical Power. Santa Margherita Ligure, Italy: IEEE, 2017: 30-35. |
[11] |
潘险险, 陈霆威, 许志恒, 等. 适应多场景的微电网一体化柔性规划方法[J]. 上海交通大学学报, 2022, 56(12): 1598-1607.
doi: 10.16183/j.cnki.jsjtu.2021.402 |
PAN Xianxian, CHEN Tingwei, XU Zhiheng, et al. A multi-scenario integrated flexible planning method for microgrid[J]. Journal of Shanghai Jiao Tong University, 2022, 56(12): 1598-1607. | |
[12] |
符杨, 丁枳尹, 米阳. 计及储能调节的时滞互联电力系统频率控制[J]. 上海交通大学学报, 2022, 56(9): 1128-1138.
doi: 10.16183/j.cnki.jsjtu.2022.145 |
FU Yang, DING Zhiyin, MI Yang. Frequency control strategy for interconnected power systems with time delay considering optimal energy storage regulation[J]. Journal of Shanghai Jiao Tong University, 2022, 56(9): 1128-1138. | |
[13] |
李珂, 邰能灵, 张沈习. 基于改进粒子群算法的配电网综合运行优化[J]. 上海交通大学学报, 2017, 51(8): 897-902.
doi: 10.16183/j.cnki.jsjtu.2017.08.001 |
LI Ke, TAI Nengling, ZHANG Shenxi. Comprehensive optimal dispatch of distribution network based on improved particle swarm optimization algorithm[J]. Journal of Shanghai Jiao Tong University, 2017, 51(8): 897-902. | |
[14] | BADAWY M O, SOZER Y. Power flow management of a grid tied PV-battery system for electric vehicles charging[J]. IEEE Transactions on Industry Applications, 2017, 53(2): 1347-1357. |
[15] | ERICK A O, FOLLY K A. Reinforcement learning approaches to power management in grid-tied microgrids: A review[C]//2020 Clemson University Power Systems Conference. Clemson, USA: IEEE, 2020: 1-6. |
[16] | JI Y, WANG J H, XU J C, et al. Real-time energy management of a microgrid using deep reinforcement learning[J]. Energies, 2019, 12(12): 2291. |
[17] | 余涛, 刘靖, 胡细兵. 基于分布式多步回溯Q(λ)学习的复杂电网最优潮流算法[J]. 电工技术学报, 2012, 27(4): 185-192. |
YU Tao, LIU Jing, HU Xibing. Optimal power flow for complex power grid using distributed multi-step backtrack Q(λ) learning[J]. Transactions of China Electrotechnical Society, 2012, 27(4): 185-192. | |
[18] | WEI Y F, ZHANG Z Q, YU F R, et al. Power allocation in HetNets with hybrid energy supply using actor-critic reinforcement learning[C]//GLOBECOM 2017-2017 IEEE Global Communications Conference. Singapore: IEEE, 2017: 1-5. |
[19] | 朱介北, 徐思旸, 李炳森, 等. 基于电网专家策略模仿学习的新型电力系统实时调度[J]. 电网技术, 2023, 47(2): 517-530. |
ZHU Jiebei, XU Siyang, LI Bingsen, et al. Real-time security dispatch of modern power system based on grid expert strategy imitation learning[J]. Power System Technology, 2023, 47(2): 517-530. | |
[20] | HU J X, YE Y J, TANG Y, et al. Towards risk-aware real-time security constrained economic dispatch: A tailored deep reinforcement learning approach[J]. IEEE Transactions on Power Systems, 2024, 39(2): 3972-3986. |
[21] | CUI H, YE Y J, HU J X, et al. Online preventive control for transmission overload relief using safe reinforcement learning with enhanced spatial-temporal awareness[J]. IEEE Transactions on Power Systems, 2024, 39(1): 517-532. |
[22] |
俞发强, 张名捷, 程语, 等. 需求响应下的并网型风-光-沼微能源网优化配置[J]. 上海交通大学学报, 2023, 57(1): 10-16.
doi: 10.16183/j.cnki.jsjtu.2022.017 |
YU Faqiang, ZHANG Mingjie, CHENG Yu, et al. Optimal sizing of grid-connected wind-solar-biogas integrated energy system considering demand response[J]. Journal of Shanghai Jiao Tong University, 2023, 57(1): 10-16. | |
[23] | ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: A brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38. |
[24] | PATERIA S, SUBAGDJA B, TAN A H, et al. Hierarchical reinforcement learning[J]. ACM Computing Surveys, 2022, 54(5): 1-35. |
[25] | YOON D, HONG S, LEE B J, et al. Winning the l2RPN challenge: Power grid management via semi-markov afterstate actor-critic[C]//The Ninth International Conference on Learning Representations. Vienna, Austria: ICLR, 2021: 1-18. |
[26] | KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks[DB/OL]. (2017-02-22)[2023-07-22]. https://arxiv.org/abs/1609.02907.pdf. |
[27] | WU L Z, KONG C, HAO X H, et al. A short-term load forecasting method based on GRU-CNN hybrid neural network model[J]. Mathematical Problems in Engineering, 2020, 2020: 1428104. |
[28] | LAN T, DUAN J J, ZHANG B, et al. AI-based autonomous line flow control via topology adjustment for maximizing time-series ATCs[C]//2020 IEEE Power & Energy Society General Meeting. Montreal, Canada: IEEE, 2020: 1-5. |
[29] | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. (2015-09-09)[2023-07-22]. http://arxiv.org/abs/1509.02971v6. |
[30] | SERRÉ G, BOGUSLAWSKI E, DONNOT B, et al. Reinforcement learning for Energies of the future and carbon neutrality: A challenge design[DB/OL]. (2022-07-21) [2023-07-22]. http://arxiv.org/abs/2207.10330v1. |
[31] | DORFER M, FUXJÄGER A R, KOZÁK K, et al. Power grid congestion management via topology optimization with AlphaZero[DB/OL]. (2022-11-10)[2023-07-22]. https://arxiv.org/abs/2211.05612.pdf. |
[32] | 季颖, 王建辉. 基于深度强化学习的微电网在线优化调度[J]. 控制与决策, 2022, 37(7): 1675-1684. |
JI Ying, WANG Jianhui. Online optimal scheduling of a microgrid based on deep reinforcement learning[J]. Control & Decision, 2022, 37(7): 1675-1684. | |
[33] | 王甜婧, 汤涌, 郭强, 等. 基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法[J]. 中国电机工程学报, 2020, 40(8): 2396-2405. |
WANG Tianjing, TANG Yong, GUO Qiang, et al. Automatic adjustment method of power flow calculation convergence for large-scale power grid based on knowledge experience and deep reinforcement learning[J]. Proceedings of the CSEE, 2020, 40(8): 2396-2405. | |
[34] | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. (2017-07-20)[2023-07-22]. http://arxiv.org/abs/1707.06347v2. |
[35] | HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[DB/OL]. (2018-01-04) [2023-07-22]. http://arxiv.org/abs/1801.01290v2. |
[36] | FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. (2018-02-26)[2023-07-22]. http://arxiv.org/abs/1802.09477v3. |
[1] | YANG Yinghe, WEI Handi, FAN Dixia, LI Ang. Optimization Method of Underwater Flapping Foil Propulsion Performance Based on Gaussian Process Regression and Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2025, 59(1): 70-78. |
[2] | LIU Huahua, WANG Qing. Multi-Aircraft Target Assignment Method Based on Reinforcement Learning [J]. Air & Space Defense, 2024, 7(5): 65-72. |
[3] | ZHOU Yi, ZHOU Liangcai, SHI Di, ZHAO Xiaoying, SHAN Xin. Coordinated Active Power-Frequency Control Based on Safe Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2024, 58(5): 682-692. |
[4] | DONG Yubo1 (董玉博), CUI Tao1 (崔涛), ZHOU Yufan1 (周禹帆), SONG Xun2 (宋勋), ZHU Yue2 (祝月), DONG Peng1∗ (董鹏). Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 646-655. |
[5] | LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang∗ (敬忠良). Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612. |
[6] | ZHAO Yingce(赵英策), ZHANG Guanghao(张广浩), XING Zhengyu(邢正宇), LI Jianxun(李建勋). Hierarchical Reinforcement Learning Adversarial Algorithm Against Opponent with Fixed Offensive Strategy [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 471-479. |
[7] | MIAO Zhenhua(苗镇华), HUANG Wentao(黄文焘), ZHANG Yilian(张依恋), FAN Qinqin(范勤勤). Multi-Robot Task Allocation Using Multimodal Multi-Objective Evolutionary Algorithm Based on Deep Reinforcement Learning [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 377-387. |
[8] | QUAN Jiale, MA Xianlong, SHEN Yuheng. Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies [J]. Air & Space Defense, 2024, 7(2): 52-62. |
[9] | ZHANG Weizhen, HE Zhen, TANG Zhangfan. Reinforcement Learning Control Design for Perching Maneuver of Unmanned Aerial Vehicles with Wind Disturbances [J]. Journal of Shanghai Jiao Tong University, 2024, 58(11): 1753-1761. |
[10] | GUO Jianguo, HU Guanjie, XU Xinpeng, LIU Yue, CAO Jin. Reinforcement Learning-Based Target Assignment Method for Many-to-Many Interceptions [J]. Air & Space Defense, 2024, 7(1): 24-31. |
[11] | MA Chi, ZHANG Guoqun, SUN Junge, LYU Guangzhe, ZHANG Tao. Deep Reinforcement Learning-Based Reconfiguration Method for Integrated Electronic Systems [J]. Air & Space Defense, 2024, 7(1): 63-70. |
[12] | WANG Xu, CAI Yuanli, ZHANG Xuecheng, ZHANG Rongliang, HAN Chenglong. Intercept Guidance Law with a Low Acceleration Ratio Based on Hierarchical Reinforcement Learning [J]. Air & Space Defense, 2024, 7(1): 40-47. |
[13] | LI Mengxuan, GUO Jianguo, XU Xinpeng, SHEN Yuheng. Guidance Law Based on Proximal Policy Optimization [J]. Air & Space Defense, 2023, 6(4): 51-57. |
[14] | SUN Jie, LI Zihao, ZHANG Shuyu. Application of Machine Learning in Chemical Synthesis and Characterization [J]. Journal of Shanghai Jiao Tong University, 2023, 57(10): 1231-1244. |
[15] | LÜ Qibing (吕其兵), LIU Tianyuan (刘天元), ZHANG Rong (张荣), JIANG Yanan (江亚南), XIAO Lei (肖雷), BAO Jingsong∗ (鲍劲松). Generation Approach of Human-Robot Cooperative Assembly Strategy Based on Transfer Learning [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(5): 602-613. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||||||
Full text 1847
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Abstract 1542
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||