上海交通大学学报 ›› 2025, Vol. 59 ›› Issue (3): 400-412.doi: 10.16183/j.cnki.jsjtu.2023.344
赵莹莹1,2, 仇越3, 朱天晨3(), 李凡1,2, 苏运1,2, 邰振赢3, 孙庆赟3, 凡航4
收稿日期:
2023-07-24
修回日期:
2023-09-26
接受日期:
2023-11-22
出版日期:
2025-03-28
发布日期:
2025-04-02
通讯作者:
朱天晨,博士;E-mail:catezi@buaa.edu.cn.
作者简介:
赵莹莹(1991—),硕士,专业工程师,从事电力大数据和人工智能技术应用工作.
基金资助:
ZHAO Yingying1,2, QIU Yue3, ZHU Tianchen3(), LI Fan1,2, SU Yun1,2, TAI Zhenying3, SUN Qingyun3, FAN Hang4
Received:
2023-07-24
Revised:
2023-09-26
Accepted:
2023-11-22
Online:
2025-03-28
Published:
2025-04-02
摘要:
随着新型电力系统的建设,高比例可再生能源的随机性导致电网运行方式的不确定性大幅增加,给电网的安全稳定经济运行带来严峻挑战.采用深度强化学习方法等数据驱动的人工智能方法对电网进行调控并进行辅助决策在新型电力系统中具有重要意义,但当前基于深度强化学习的在线调度算法仍然面临高维决策空间难建模、调度策略难优化的问题,使得模型搜索效率较低、收敛较慢.因此,提出一种基于分层强化学习的新型电力系统在线稳态调度方法,通过自适应选取关键节点调节以降低决策空间.在此基础上进一步引入基于门控循环单元的状态上下文感知模块建模高维环境状态,综合运行成本、能源消纳以及越限情况为优化目标构建模型,并考虑各种运行约束.在IEEE-118、L2RPN-WCCI-2022和SG-126算例集上验证了所提算法的有效性.
中图分类号:
赵莹莹, 仇越, 朱天晨, 李凡, 苏运, 邰振赢, 孙庆赟, 凡航. 基于分层强化学习的新型电力系统在线稳态调度[J]. 上海交通大学学报, 2025, 59(3): 400-412.
ZHAO Yingying, QIU Yue, ZHU Tianchen, LI Fan, SU Yun, TAI Zhenying, SUN Qingyun, FAN Hang. Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning[J]. Journal of Shanghai Jiao Tong University, 2025, 59(3): 400-412.
表1
模型超参数取值
参数名 | 参数含义 | 参数取值 |
---|---|---|
lr_actor | Actor模型初始学习率 | 1×10-5 |
lr_critic | Critic模型初始学习率 | 1×10-3 |
max_episode | 训练总回合数 | 2×105 |
batch_size | 每批次训练样本大小 | 1 024 |
gradient_clip | 梯度裁剪上限 | 1.0 |
init_action_std | 动作随机探索噪声初始标准差 | 0.3 |
active_function | 模型激活函数 | Tanh |
mlp_num_layers | Actor和Critic隐藏层数目 | 3 |
history_state_len | 历史信息序列长度 | 25 |
gru_num_layers | GRU模型结构层数 | 2 |
gru_hidden_size | GRU模型隐藏层维度 | 64 |
gcn_hidden_size | GCN模型隐藏层维度 | 32 |
gcn_dropout | GCN模型舍弃率 | 0.1 |
表2
各算例中算法性能评价(均值±方差)
算例 | 算法 | xscore | xround |
---|---|---|---|
IEEE-118 | Random | -14.09±8.21 | 21.48±12.88 |
DDPG | 413.65±114.00 | 844.82±192.19 | |
TD3 | 497.57±65.75 | 919.82±89.09 | |
A2C | 5.95±1.48 | 58.20±3.46 | |
PPO | 5.68±1.39 | 56.34±3.06 | |
StarHeart | 1327.24±103.59 | 2229.83±186.79 | |
L2RPN-WCCI-2022 | Random | -8.33±6.12 | 20.22±5.84 |
DDPG | 58.22±16.97 | 126.32±25.17 | |
TD3 | 46.51±11.35 | 100.96±19.60 | |
A2C | 5.43±1.71 | 40.07±2.52 | |
PPO | 6.46±3.23 | 39.71±2.33 | |
StarHeart | 76.56±8.31 | 223.66±15.20 | |
SG-126 | Random | 19.94±1.06 | 30.34±1.89 |
DDPG | 109.38±13.14 | 141.27±16.98 | |
TD3 | 251.59±27.26 | 371.75±34.36 | |
A2C | 263.69±21.29 | 573.17±59.44 | |
PPO | 150.36±44.69 | 262.03±72.14 | |
StarHeart | 684.30±60.16 | 783.80±79.15 |
[1] | 王继业. 人工智能赋能源网荷储协同互动的应用及展望[J]. 中国电机工程学报, 2022, 42(21): 7667-7681. |
WANG Jiye. Application and prospect of source-grid-load-storage coordination enabled by artificial intelligence[J]. Proceedings of the CSEE, 2022, 42(21): 7667-7681. | |
[2] |
叶志亮, 黎灿兵, 张勇军, 等. 含高比例气象敏感可再生能源电网日前调度时间颗粒度优化[J]. 上海交通大学学报, 2023, 57(7): 781-790.
doi: 10.16183/j.cnki.jsjtu.2022.277 |
YE Zhiliang, LI Canbing, ZHANG Yongjun, et al. Optimization of day-ahead dispatch time resolution in power system with a high proportion of climate-sensitive renewable energy sources[J]. Journal of Shanghai Jiao Tong University, 2023, 57(7): 781-790. | |
[3] | RIFFONNEAU Y, BACHA S, BARRUEL F, et al. Optimal power flow management for grid connected PV systems with batteries[J]. IEEE Transactions on Sustainable Energy, 2011, 2(3): 309-320. |
[4] | AN L N, QUOC-TUAN T. Optimal energy management for grid connected microgrid by using dynamic programming method[C]//2015 IEEE Power & Energy Society General Meeting. Denver, USA: IEEE, 2015: 1-5. |
[5] | 李鹏, 王加浩, 黎灿兵, 等. 计及源荷不确定性与设备变工况特性的园区综合能源系统协同优化运行方法[J]. 中国电机工程学报, 2023, 43(20): 7802-7811. |
LI Peng, WANG Jiahao, LI Canbing, et al. Collaborative optimal scheduling of the community integrated energy system considering source-load uncertainty and equipment off-design performance[J]. Proceedings of the CSEE, 2023, 43(20): 7802-7811. | |
[6] | GUO Y F, WU Q W, GAO H L, et al. Double-time-scale coordinated voltage control in active distribution networks based on MPC[J]. IEEE Transactions on Sustainable Energy, 2020, 11(1): 294-303. |
[7] |
陈雨婷, 赵毅, 吴俊达, 等. 考虑碳排放指标的配电网经济调度方法[J]. 上海交通大学学报, 2023, 57(4): 442-451.
doi: 10.16183/j.cnki.jsjtu.2021.482 |
CHEN Yuting, ZHAO Yi, WU Junda, et al. Economic dispatch method of distribution network considering carbon emission index[J]. Journal of Shanghai Jiao Tong University, 2023, 57(4): 442-451. | |
[8] | 戚艳, 尚学军, 聂靖宇, 等. 基于改进多目标灰狼算法的冷热电联供型微电网运行优化[J]. 电测与仪表, 2022, 59(6): 12-19. |
QI Yan, SHANG Xuejun, NIE Jingyu, et al. Optimization of CCHP micro-grid operation based on improved multi-objective grey wolf algorithm[J]. Electrical Measurement & Instrumentation, 2022, 59(6): 12-19. | |
[9] | 刘新苗, 李卓环, 曾凯文, 等. 基于集群负荷预测的主动配电网多目标优化调度[J]. 电测与仪表, 2021, 58(5): 98-104. |
LIU Xinmiao, LI Zhuohuan, ZENG Kaiwen, et al. Multi-objective optimal dispatching of active distribution network based on cluster load prediction[J]. Electrical Measurement & Instrumentation, 2021, 58(5): 98-104. | |
[10] | HIJJO M, FELGNER F, FREY G. PV-Battery-Diesel microgrid layout design based on stochastic optimization[C]//2017 6th International Conference on Clean Electrical Power. Santa Margherita Ligure, Italy: IEEE, 2017: 30-35. |
[11] |
潘险险, 陈霆威, 许志恒, 等. 适应多场景的微电网一体化柔性规划方法[J]. 上海交通大学学报, 2022, 56(12): 1598-1607.
doi: 10.16183/j.cnki.jsjtu.2021.402 |
PAN Xianxian, CHEN Tingwei, XU Zhiheng, et al. A multi-scenario integrated flexible planning method for microgrid[J]. Journal of Shanghai Jiao Tong University, 2022, 56(12): 1598-1607. | |
[12] |
符杨, 丁枳尹, 米阳. 计及储能调节的时滞互联电力系统频率控制[J]. 上海交通大学学报, 2022, 56(9): 1128-1138.
doi: 10.16183/j.cnki.jsjtu.2022.145 |
FU Yang, DING Zhiyin, MI Yang. Frequency control strategy for interconnected power systems with time delay considering optimal energy storage regulation[J]. Journal of Shanghai Jiao Tong University, 2022, 56(9): 1128-1138. | |
[13] |
李珂, 邰能灵, 张沈习. 基于改进粒子群算法的配电网综合运行优化[J]. 上海交通大学学报, 2017, 51(8): 897-902.
doi: 10.16183/j.cnki.jsjtu.2017.08.001 |
LI Ke, TAI Nengling, ZHANG Shenxi. Comprehensive optimal dispatch of distribution network based on improved particle swarm optimization algorithm[J]. Journal of Shanghai Jiao Tong University, 2017, 51(8): 897-902. | |
[14] | BADAWY M O, SOZER Y. Power flow management of a grid tied PV-battery system for electric vehicles charging[J]. IEEE Transactions on Industry Applications, 2017, 53(2): 1347-1357. |
[15] | ERICK A O, FOLLY K A. Reinforcement learning approaches to power management in grid-tied microgrids: A review[C]//2020 Clemson University Power Systems Conference. Clemson, USA: IEEE, 2020: 1-6. |
[16] | JI Y, WANG J H, XU J C, et al. Real-time energy management of a microgrid using deep reinforcement learning[J]. Energies, 2019, 12(12): 2291. |
[17] | 余涛, 刘靖, 胡细兵. 基于分布式多步回溯Q(λ)学习的复杂电网最优潮流算法[J]. 电工技术学报, 2012, 27(4): 185-192. |
YU Tao, LIU Jing, HU Xibing. Optimal power flow for complex power grid using distributed multi-step backtrack Q(λ) learning[J]. Transactions of China Electrotechnical Society, 2012, 27(4): 185-192. | |
[18] | WEI Y F, ZHANG Z Q, YU F R, et al. Power allocation in HetNets with hybrid energy supply using actor-critic reinforcement learning[C]//GLOBECOM 2017-2017 IEEE Global Communications Conference. Singapore: IEEE, 2017: 1-5. |
[19] | 朱介北, 徐思旸, 李炳森, 等. 基于电网专家策略模仿学习的新型电力系统实时调度[J]. 电网技术, 2023, 47(2): 517-530. |
ZHU Jiebei, XU Siyang, LI Bingsen, et al. Real-time security dispatch of modern power system based on grid expert strategy imitation learning[J]. Power System Technology, 2023, 47(2): 517-530. | |
[20] | HU J X, YE Y J, TANG Y, et al. Towards risk-aware real-time security constrained economic dispatch: A tailored deep reinforcement learning approach[J]. IEEE Transactions on Power Systems, 2024, 39(2): 3972-3986. |
[21] | CUI H, YE Y J, HU J X, et al. Online preventive control for transmission overload relief using safe reinforcement learning with enhanced spatial-temporal awareness[J]. IEEE Transactions on Power Systems, 2024, 39(1): 517-532. |
[22] |
俞发强, 张名捷, 程语, 等. 需求响应下的并网型风-光-沼微能源网优化配置[J]. 上海交通大学学报, 2023, 57(1): 10-16.
doi: 10.16183/j.cnki.jsjtu.2022.017 |
YU Faqiang, ZHANG Mingjie, CHENG Yu, et al. Optimal sizing of grid-connected wind-solar-biogas integrated energy system considering demand response[J]. Journal of Shanghai Jiao Tong University, 2023, 57(1): 10-16. | |
[23] | ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: A brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38. |
[24] | PATERIA S, SUBAGDJA B, TAN A H, et al. Hierarchical reinforcement learning[J]. ACM Computing Surveys, 2022, 54(5): 1-35. |
[25] | YOON D, HONG S, LEE B J, et al. Winning the l2RPN challenge: Power grid management via semi-markov afterstate actor-critic[C]//The Ninth International Conference on Learning Representations. Vienna, Austria: ICLR, 2021: 1-18. |
[26] | KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks[DB/OL]. (2017-02-22)[2023-07-22]. https://arxiv.org/abs/1609.02907.pdf. |
[27] | WU L Z, KONG C, HAO X H, et al. A short-term load forecasting method based on GRU-CNN hybrid neural network model[J]. Mathematical Problems in Engineering, 2020, 2020: 1428104. |
[28] | LAN T, DUAN J J, ZHANG B, et al. AI-based autonomous line flow control via topology adjustment for maximizing time-series ATCs[C]//2020 IEEE Power & Energy Society General Meeting. Montreal, Canada: IEEE, 2020: 1-5. |
[29] | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. (2015-09-09)[2023-07-22]. http://arxiv.org/abs/1509.02971v6. |
[30] | SERRÉ G, BOGUSLAWSKI E, DONNOT B, et al. Reinforcement learning for Energies of the future and carbon neutrality: A challenge design[DB/OL]. (2022-07-21) [2023-07-22]. http://arxiv.org/abs/2207.10330v1. |
[31] | DORFER M, FUXJÄGER A R, KOZÁK K, et al. Power grid congestion management via topology optimization with AlphaZero[DB/OL]. (2022-11-10)[2023-07-22]. https://arxiv.org/abs/2211.05612.pdf. |
[32] | 季颖, 王建辉. 基于深度强化学习的微电网在线优化调度[J]. 控制与决策, 2022, 37(7): 1675-1684. |
JI Ying, WANG Jianhui. Online optimal scheduling of a microgrid based on deep reinforcement learning[J]. Control & Decision, 2022, 37(7): 1675-1684. | |
[33] | 王甜婧, 汤涌, 郭强, 等. 基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法[J]. 中国电机工程学报, 2020, 40(8): 2396-2405. |
WANG Tianjing, TANG Yong, GUO Qiang, et al. Automatic adjustment method of power flow calculation convergence for large-scale power grid based on knowledge experience and deep reinforcement learning[J]. Proceedings of the CSEE, 2020, 40(8): 2396-2405. | |
[34] | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. (2017-07-20)[2023-07-22]. http://arxiv.org/abs/1707.06347v2. |
[35] | HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[DB/OL]. (2018-01-04) [2023-07-22]. http://arxiv.org/abs/1801.01290v2. |
[36] | FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. (2018-02-26)[2023-07-22]. http://arxiv.org/abs/1802.09477v3. |
[1] | 杨映荷, 魏汉迪, 范迪夏, 李昂. 基于高斯过程回归和深度强化学习的水下扑翼推进性能寻优方法[J]. 上海交通大学学报, 2025, 59(1): 70-78. |
[2] | 周毅, 周良才, 史迪, 赵小英, 闪鑫. 基于安全深度强化学习的电网有功频率协同优化控制[J]. 上海交通大学学报, 2024, 58(5): 682-692. |
[3] | 刘华华, 王青. 基于强化学习的多飞行器目标分配方法[J]. 空天防御, 2024, 7(5): 65-72. |
[4] | 董玉博1, 崔涛1, 周禹帆1, 宋勋2, 祝月2, 董鹏1. 基于长周期极坐标系追击问题的多智能体强化学习奖赏函数设计方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 646-655. |
[5] | 李舒逸, 李旻哲, 敬忠良. 动态环境下基于改进DQN的多智能体路径规划方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612. |
[6] | 赵英策1,张广浩2,邢正宇2,李建勋2. 面向确定进攻对手策略的层次强化学习对抗算法研究[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 471-479. |
[7] | 苗镇华1, 黄文焘2, 张依恋3, 范勤勤1. 基于深度强化学习的多模态多目标多机器人任务分配算法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 377-387. |
[8] | 全家乐, 马先龙, 沈昱恒. 基于近端策略动态优化的多智能体编队方法[J]. 空天防御, 2024, 7(2): 52-62. |
[9] | 张威振, 何真, 汤张帆. 风扰下无人机栖落机动的强化学习控制设计[J]. 上海交通大学学报, 2024, 58(11): 1753-1761. |
[10] | 马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70. |
[11] | 王旭, 蔡远利, 张学成, 张荣良, 韩成龙. 基于分层强化学习的低过载比拦截制导律[J]. 空天防御, 2024, 7(1): 40-47. |
[12] | 郭建国, 胡冠杰, 许新鹏, 刘悦, 曹晋. 基于强化学习的多对多拦截目标分配方法[J]. 空天防御, 2024, 7(1): 24-31. |
[13] | 李梦璇, 郭建国, 许新鹏, 沈昱恒. 基于近端策略优化的制导律设计[J]. 空天防御, 2023, 6(4): 51-57. |
[14] | 孙婕, 李子昊, 张书宇. 机器学习在化学合成及表征中的应用[J]. 上海交通大学学报, 2023, 57(10): 1231-1244. |
[15] | 尚熙, 杨革文, 戴少怀, 蒋伊琳. 基于强化学习的一对多雷达干扰资源分配策略研究[J]. 空天防御, 2022, 5(1): 94-101. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||