基于分层强化学习的新型电力系统在线稳态调度

doi:10.16183/j.cnki.jsjtu.2023.344

上海交通大学学报 ›› 2025, Vol. 59 ›› Issue (3): 400-412.doi: 10.16183/j.cnki.jsjtu.2023.344

• 新型电力系统与综合能源 • 上一篇下一篇

基于分层强化学习的新型电力系统在线稳态调度

赵莹莹¹^,², 仇越³, 朱天晨³(), 李凡¹^,², 苏运¹^,², 邰振赢³, 孙庆赟³, 凡航⁴

1.国网上海市电力公司,上海 200125
2.华东电力试验研究院有限公司,上海 200437
3.北京航空航天大学,北京 100191
4.华北电力大学经济与管理学院,北京 100096

收稿日期:2023-07-24 修回日期:2023-09-26 接受日期:2023-11-22 出版日期:2025-03-28 发布日期:2025-04-02
通讯作者: 朱天晨,博士;E-mail:catezi@buaa.edu.cn.
作者简介:赵莹莹(1991—),硕士,专业工程师,从事电力大数据和人工智能技术应用工作.
基金资助:
国网上海市电力公司科技项目(B3094022000D);上海电力人工智能工程技术研究中心研究项目(19DZ2252800)

Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning

ZHAO Yingying¹^,², QIU Yue³, ZHU Tianchen³(), LI Fan¹^,², SU Yun¹^,², TAI Zhenying³, SUN Qingyun³, FAN Hang⁴

1. State Grid Shanghai Municipal Electric Power Company, Shanghai 200125, China
2. East China Electric Power Test and Research Institute Co., Ltd., Shanghai 200437, China
3. Beihang University, Beijing 100191, China
4. School of Economics and Management, North China Electric Power University, Beijing 100096, China

Received:2023-07-24 Revised:2023-09-26 Accepted:2023-11-22 Online:2025-03-28 Published:2025-04-02

摘要/Abstract

摘要：

随着新型电力系统的建设,高比例可再生能源的随机性导致电网运行方式的不确定性大幅增加,给电网的安全稳定经济运行带来严峻挑战.采用深度强化学习方法等数据驱动的人工智能方法对电网进行调控并进行辅助决策在新型电力系统中具有重要意义,但当前基于深度强化学习的在线调度算法仍然面临高维决策空间难建模、调度策略难优化的问题,使得模型搜索效率较低、收敛较慢.因此,提出一种基于分层强化学习的新型电力系统在线稳态调度方法,通过自适应选取关键节点调节以降低决策空间.在此基础上进一步引入基于门控循环单元的状态上下文感知模块建模高维环境状态,综合运行成本、能源消纳以及越限情况为优化目标构建模型,并考虑各种运行约束.在IEEE-118、L2RPN-WCCI-2022和SG-126算例集上验证了所提算法的有效性.

关键词: 电网运行调度, 强化学习, 分层决策, 状态表征

Abstract:

With the construction of new power systems, the stochasticity of high-proportion renewable energy significantly increases the uncertainty in the operation of the power grid, posing severe challenges to its safe, stable, and economically efficient operation. Data-driven artificial intelligence methods, such as deep reinforcement learning, are becoming increasingly important for regulating and assisting decision-making in the power grid in the new power system. However, current online scheduling algorithms based on deep reinforcement learning still face challenges in modeling the high-dimensional decision space and optimizing scheduling strategies, resulting in low model search efficiency and slow convergence. Therefore, a novel online steady-state scheduling method is proposed for the new power system based on hierarchical reinforcement learning, which reduces the decision space by adaptively selecting key nodes for adjustment. In addition, a state context-aware module based on gated recurrent units is introduced to model the high-dimensional environmental state, and a model with the optimization objectives of comprehensive operating costs, energy consumption, and over-limit conditions is constructed considering various operational constraints. The effectiveness of the proposed algorithm is thoroughly validated through experiments on three standard test cases, including IEEE-118, L2RPN-WCCI-2022, and SG-126.

Key words: operation scheduling of power grid, reinforcement learning, hierarchical decision making, state representation

中图分类号:

TM933

赵莹莹, 仇越, 朱天晨, 李凡, 苏运, 邰振赢, 孙庆赟, 凡航. 基于分层强化学习的新型电力系统在线稳态调度[J]. 上海交通大学学报, 2025, 59(3): 400-412.

ZHAO Yingying, QIU Yue, ZHU Tianchen, LI Fan, SU Yun, TAI Zhenying, SUN Qingyun, FAN Hang. Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning[J]. Journal of Shanghai Jiao Tong University, 2025, 59(3): 400-412.

图/表 12

图1

图2

图3

图4

表1

表2

图5

图6

图7

图8

图9

表3

参考文献 36

[1]	王继业. 人工智能赋能源网荷储协同互动的应用及展望[J]. 中国电机工程学报, 2022, 42(21): 7667-7681.
	WANG Jiye. Application and prospect of source-grid-load-storage coordination enabled by artificial intelligence[J]. Proceedings of the CSEE, 2022, 42(21): 7667-7681.
[2]	叶志亮, 黎灿兵, 张勇军, 等. 含高比例气象敏感可再生能源电网日前调度时间颗粒度优化[J]. 上海交通大学学报, 2023, 57(7): 781-790. doi: 10.16183/j.cnki.jsjtu.2022.277
	YE Zhiliang, LI Canbing, ZHANG Yongjun, et al. Optimization of day-ahead dispatch time resolution in power system with a high proportion of climate-sensitive renewable energy sources[J]. Journal of Shanghai Jiao Tong University, 2023, 57(7): 781-790.
[3]	RIFFONNEAU Y, BACHA S, BARRUEL F, et al. Optimal power flow management for grid connected PV systems with batteries[J]. IEEE Transactions on Sustainable Energy, 2011, 2(3): 309-320.
[4]	AN L N, QUOC-TUAN T. Optimal energy management for grid connected microgrid by using dynamic programming method[C]//2015 IEEE Power & Energy Society General Meeting. Denver, USA: IEEE, 2015: 1-5.
[5]	李鹏, 王加浩, 黎灿兵, 等. 计及源荷不确定性与设备变工况特性的园区综合能源系统协同优化运行方法[J]. 中国电机工程学报, 2023, 43(20): 7802-7811.
	LI Peng, WANG Jiahao, LI Canbing, et al. Collaborative optimal scheduling of the community integrated energy system considering source-load uncertainty and equipment off-design performance[J]. Proceedings of the CSEE, 2023, 43(20): 7802-7811.
[6]	GUO Y F, WU Q W, GAO H L, et al. Double-time-scale coordinated voltage control in active distribution networks based on MPC[J]. IEEE Transactions on Sustainable Energy, 2020, 11(1): 294-303.
[7]	陈雨婷, 赵毅, 吴俊达, 等. 考虑碳排放指标的配电网经济调度方法[J]. 上海交通大学学报, 2023, 57(4): 442-451. doi: 10.16183/j.cnki.jsjtu.2021.482
	CHEN Yuting, ZHAO Yi, WU Junda, et al. Economic dispatch method of distribution network considering carbon emission index[J]. Journal of Shanghai Jiao Tong University, 2023, 57(4): 442-451.
[8]	戚艳, 尚学军, 聂靖宇, 等. 基于改进多目标灰狼算法的冷热电联供型微电网运行优化[J]. 电测与仪表, 2022, 59(6): 12-19.
	QI Yan, SHANG Xuejun, NIE Jingyu, et al. Optimization of CCHP micro-grid operation based on improved multi-objective grey wolf algorithm[J]. Electrical Measurement & Instrumentation, 2022, 59(6): 12-19.
[9]	刘新苗, 李卓环, 曾凯文, 等. 基于集群负荷预测的主动配电网多目标优化调度[J]. 电测与仪表, 2021, 58(5): 98-104.
	LIU Xinmiao, LI Zhuohuan, ZENG Kaiwen, et al. Multi-objective optimal dispatching of active distribution network based on cluster load prediction[J]. Electrical Measurement & Instrumentation, 2021, 58(5): 98-104.
[10]	HIJJO M, FELGNER F, FREY G. PV-Battery-Diesel microgrid layout design based on stochastic optimization[C]//2017 6th International Conference on Clean Electrical Power. Santa Margherita Ligure, Italy: IEEE, 2017: 30-35.
[11]	潘险险, 陈霆威, 许志恒, 等. 适应多场景的微电网一体化柔性规划方法[J]. 上海交通大学学报, 2022, 56(12): 1598-1607. doi: 10.16183/j.cnki.jsjtu.2021.402
	PAN Xianxian, CHEN Tingwei, XU Zhiheng, et al. A multi-scenario integrated flexible planning method for microgrid[J]. Journal of Shanghai Jiao Tong University, 2022, 56(12): 1598-1607.
[12]	符杨, 丁枳尹, 米阳. 计及储能调节的时滞互联电力系统频率控制[J]. 上海交通大学学报, 2022, 56(9): 1128-1138. doi: 10.16183/j.cnki.jsjtu.2022.145
	FU Yang, DING Zhiyin, MI Yang. Frequency control strategy for interconnected power systems with time delay considering optimal energy storage regulation[J]. Journal of Shanghai Jiao Tong University, 2022, 56(9): 1128-1138.
[13]	李珂, 邰能灵, 张沈习. 基于改进粒子群算法的配电网综合运行优化[J]. 上海交通大学学报, 2017, 51(8): 897-902. doi: 10.16183/j.cnki.jsjtu.2017.08.001
	LI Ke, TAI Nengling, ZHANG Shenxi. Comprehensive optimal dispatch of distribution network based on improved particle swarm optimization algorithm[J]. Journal of Shanghai Jiao Tong University, 2017, 51(8): 897-902.
[14]	BADAWY M O, SOZER Y. Power flow management of a grid tied PV-battery system for electric vehicles charging[J]. IEEE Transactions on Industry Applications, 2017, 53(2): 1347-1357.
[15]	ERICK A O, FOLLY K A. Reinforcement learning approaches to power management in grid-tied microgrids: A review[C]//2020 Clemson University Power Systems Conference. Clemson, USA: IEEE, 2020: 1-6.
[16]	JI Y, WANG J H, XU J C, et al. Real-time energy management of a microgrid using deep reinforcement learning[J]. Energies, 2019, 12(12): 2291.
[17]	余涛, 刘靖, 胡细兵. 基于分布式多步回溯Q(λ)学习的复杂电网最优潮流算法[J]. 电工技术学报, 2012, 27(4): 185-192.
	YU Tao, LIU Jing, HU Xibing. Optimal power flow for complex power grid using distributed multi-step backtrack Q(λ) learning[J]. Transactions of China Electrotechnical Society, 2012, 27(4): 185-192.
[18]	WEI Y F, ZHANG Z Q, YU F R, et al. Power allocation in HetNets with hybrid energy supply using actor-critic reinforcement learning[C]//GLOBECOM 2017-2017 IEEE Global Communications Conference. Singapore: IEEE, 2017: 1-5.
[19]	朱介北, 徐思旸, 李炳森, 等. 基于电网专家策略模仿学习的新型电力系统实时调度[J]. 电网技术, 2023, 47(2): 517-530.
	ZHU Jiebei, XU Siyang, LI Bingsen, et al. Real-time security dispatch of modern power system based on grid expert strategy imitation learning[J]. Power System Technology, 2023, 47(2): 517-530.
[20]	HU J X, YE Y J, TANG Y, et al. Towards risk-aware real-time security constrained economic dispatch: A tailored deep reinforcement learning approach[J]. IEEE Transactions on Power Systems, 2024, 39(2): 3972-3986.
[21]	CUI H, YE Y J, HU J X, et al. Online preventive control for transmission overload relief using safe reinforcement learning with enhanced spatial-temporal awareness[J]. IEEE Transactions on Power Systems, 2024, 39(1): 517-532.
[22]	俞发强, 张名捷, 程语, 等. 需求响应下的并网型风-光-沼微能源网优化配置[J]. 上海交通大学学报, 2023, 57(1): 10-16. doi: 10.16183/j.cnki.jsjtu.2022.017
	YU Faqiang, ZHANG Mingjie, CHENG Yu, et al. Optimal sizing of grid-connected wind-solar-biogas integrated energy system considering demand response[J]. Journal of Shanghai Jiao Tong University, 2023, 57(1): 10-16.
[23]	ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: A brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38.
[24]	PATERIA S, SUBAGDJA B, TAN A H, et al. Hierarchical reinforcement learning[J]. ACM Computing Surveys, 2022, 54(5): 1-35.
[25]	YOON D, HONG S, LEE B J, et al. Winning the l2RPN challenge: Power grid management via semi-markov afterstate actor-critic[C]//The Ninth International Conference on Learning Representations. Vienna, Austria: ICLR, 2021: 1-18.
[26]	KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks[DB/OL]. (2017-02-22)[2023-07-22]. https://arxiv.org/abs/1609.02907.pdf.
[27]	WU L Z, KONG C, HAO X H, et al. A short-term load forecasting method based on GRU-CNN hybrid neural network model[J]. Mathematical Problems in Engineering, 2020, 2020: 1428104.
[28]	LAN T, DUAN J J, ZHANG B, et al. AI-based autonomous line flow control via topology adjustment for maximizing time-series ATCs[C]//2020 IEEE Power & Energy Society General Meeting. Montreal, Canada: IEEE, 2020: 1-5.
[29]	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. (2015-09-09)[2023-07-22]. http://arxiv.org/abs/1509.02971v6.
[30]	SERRÉ G, BOGUSLAWSKI E, DONNOT B, et al. Reinforcement learning for Energies of the future and carbon neutrality: A challenge design[DB/OL]. (2022-07-21) [2023-07-22]. http://arxiv.org/abs/2207.10330v1.
[31]	DORFER M, FUXJÄGER A R, KOZÁK K, et al. Power grid congestion management via topology optimization with AlphaZero[DB/OL]. (2022-11-10)[2023-07-22]. https://arxiv.org/abs/2211.05612.pdf.
[32]	季颖, 王建辉. 基于深度强化学习的微电网在线优化调度[J]. 控制与决策, 2022, 37(7): 1675-1684.
	JI Ying, WANG Jianhui. Online optimal scheduling of a microgrid based on deep reinforcement learning[J]. Control & Decision, 2022, 37(7): 1675-1684.
[33]	王甜婧, 汤涌, 郭强, 等. 基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法[J]. 中国电机工程学报, 2020, 40(8): 2396-2405.
	WANG Tianjing, TANG Yong, GUO Qiang, et al. Automatic adjustment method of power flow calculation convergence for large-scale power grid based on knowledge experience and deep reinforcement learning[J]. Proceedings of the CSEE, 2020, 40(8): 2396-2405.
[34]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. (2017-07-20)[2023-07-22]. http://arxiv.org/abs/1707.06347v2.
[35]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[DB/OL]. (2018-01-04) [2023-07-22]. http://arxiv.org/abs/1801.01290v2.
[36]	FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. (2018-02-26)[2023-07-22]. http://arxiv.org/abs/1802.09477v3.

参数名	参数含义	参数取值
lr_actor	Actor模型初始学习率	1×10^-5
lr_critic	Critic模型初始学习率	1×10^-3
max_episode	训练总回合数	2×10⁵
batch_size	每批次训练样本大小	1 024
gradient_clip	梯度裁剪上限	1.0
init_action_std	动作随机探索噪声初始标准差	0.3
active_function	模型激活函数	Tanh
mlp_num_layers	Actor和Critic隐藏层数目	3
history_state_len	历史信息序列长度	25
gru_num_layers	GRU模型结构层数	2
gru_hidden_size	GRU模型隐藏层维度	64
gcn_hidden_size	GCN模型隐藏层维度	32
gcn_dropout	GCN模型舍弃率	0.1

算例	算法	x_score	x_round
IEEE-118	Random	-14.09±8.21	21.48±12.88
	DDPG	413.65±114.00	844.82±192.19
	TD3	497.57±65.75	919.82±89.09
	A2C	5.95±1.48	58.20±3.46
	PPO	5.68±1.39	56.34±3.06
	StarHeart	1327.24±103.59	2229.83±186.79
L2RPN-WCCI-2022	Random	-8.33±6.12	20.22±5.84
	DDPG	58.22±16.97	126.32±25.17
	TD3	46.51±11.35	100.96±19.60
	A2C	5.43±1.71	40.07±2.52
	PPO	6.46±3.23	39.71±2.33
	StarHeart	76.56±8.31	223.66±15.20
SG-126	Random	19.94±1.06	30.34±1.89
	DDPG	109.38±13.14	141.27±16.98
	TD3	251.59±27.26	371.75±34.36
	A2C	263.69±21.29	573.17±59.44
	PPO	150.36±44.69	262.03±72.14
	StarHeart	684.30±60.16	783.80±79.15

算法	x_score	x_round
StarHeart	684.30±60.16	783.80±79.15
StarHeart-H	147.19±19.14	190.53±21.40
StarHeart-S	492.61±39.98	600.59±56.17
StarHeart-F	725.34±71.32	794.29±79.32
StarHeart-S-F	551.72±46.98	704.03±74.17

基于分层强化学习的新型电力系统在线稳态调度

Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 36

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨映荷, 魏汉迪, 范迪夏, 李昂. 基于高斯过程回归和深度强化学习的水下扑翼推进性能寻优方法[J]. 上海交通大学学报, 2025, 59(1): 70-78.
[2]	周毅, 周良才, 史迪, 赵小英, 闪鑫. 基于安全深度强化学习的电网有功频率协同优化控制[J]. 上海交通大学学报, 2024, 58(5): 682-692.
[3]	刘华华, 王青. 基于强化学习的多飞行器目标分配方法[J]. 空天防御, 2024, 7(5): 65-72.
[4]	董玉博1, 崔涛1, 周禹帆1, 宋勋2, 祝月2, 董鹏1. 基于长周期极坐标系追击问题的多智能体强化学习奖赏函数设计方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 646-655.
[5]	李舒逸, 李旻哲, 敬忠良. 动态环境下基于改进DQN的多智能体路径规划方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612.
[6]	赵英策1，张广浩2，邢正宇2，李建勋2. 面向确定进攻对手策略的层次强化学习对抗算法研究[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 471-479.
[7]	苗镇华1, 黄文焘2, 张依恋3, 范勤勤1. 基于深度强化学习的多模态多目标多机器人任务分配算法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 377-387.
[8]	全家乐, 马先龙, 沈昱恒. 基于近端策略动态优化的多智能体编队方法[J]. 空天防御, 2024, 7(2): 52-62.
[9]	张威振, 何真, 汤张帆. 风扰下无人机栖落机动的强化学习控制设计[J]. 上海交通大学学报, 2024, 58(11): 1753-1761.
[10]	马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70.
[11]	王旭, 蔡远利, 张学成, 张荣良, 韩成龙. 基于分层强化学习的低过载比拦截制导律[J]. 空天防御, 2024, 7(1): 40-47.
[12]	郭建国, 胡冠杰, 许新鹏, 刘悦, 曹晋. 基于强化学习的多对多拦截目标分配方法[J]. 空天防御, 2024, 7(1): 24-31.
[13]	李梦璇, 郭建国, 许新鹏, 沈昱恒. 基于近端策略优化的制导律设计[J]. 空天防御, 2023, 6(4): 51-57.
[14]	孙婕, 李子昊, 张书宇. 机器学习在化学合成及表征中的应用[J]. 上海交通大学学报, 2023, 57(10): 1231-1244.
[15]	尚熙, 杨革文, 戴少怀, 蒋伊琳. 基于强化学习的一对多雷达干扰资源分配策略研究[J]. 空天防御, 2022, 5(1): 94-101.