新型电力系统与综合能源

基于分层强化学习的新型电力系统在线稳态调度

  • 赵莹莹 ,
  • 仇越 ,
  • 朱天晨 ,
  • 李凡 ,
  • 苏运 ,
  • 邰振赢 ,
  • 孙庆赟 ,
  • 凡航
展开
  • 1.国网上海市电力公司,上海 200125
    2.华东电力试验研究院有限公司,上海 200437
    3.北京航空航天大学,北京 100191
    4.华北电力大学 经济与管理学院,北京 100096
赵莹莹(1991—),硕士,专业工程师,从事电力大数据和人工智能技术应用工作.
朱天晨,博士;E-mail:catezi@buaa.edu.cn.

收稿日期: 2023-07-24

  修回日期: 2023-09-26

  录用日期: 2023-11-22

  网络出版日期: 2023-12-12

基金资助

国网上海市电力公司科技项目(B3094022000D);上海电力人工智能工程技术研究中心研究项目(19DZ2252800)

Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning

  • ZHAO Yingying ,
  • QIU Yue ,
  • ZHU Tianchen ,
  • LI Fan ,
  • SU Yun ,
  • TAI Zhenying ,
  • SUN Qingyun ,
  • FAN Hang
Expand
  • 1. State Grid Shanghai Municipal Electric Power Company, Shanghai 200125, China
    2. East China Electric Power Test and Research Institute Co., Ltd., Shanghai 200437, China
    3. Beihang University, Beijing 100191, China
    4. School of Economics and Management, North China Electric Power University, Beijing 100096, China

Received date: 2023-07-24

  Revised date: 2023-09-26

  Accepted date: 2023-11-22

  Online published: 2023-12-12

摘要

随着新型电力系统的建设,高比例可再生能源的随机性导致电网运行方式的不确定性大幅增加,给电网的安全稳定经济运行带来严峻挑战.采用深度强化学习方法等数据驱动的人工智能方法对电网进行调控并进行辅助决策在新型电力系统中具有重要意义,但当前基于深度强化学习的在线调度算法仍然面临高维决策空间难建模、调度策略难优化的问题,使得模型搜索效率较低、收敛较慢.因此,提出一种基于分层强化学习的新型电力系统在线稳态调度方法,通过自适应选取关键节点调节以降低决策空间.在此基础上进一步引入基于门控循环单元的状态上下文感知模块建模高维环境状态,综合运行成本、能源消纳以及越限情况为优化目标构建模型,并考虑各种运行约束.在IEEE-118、L2RPN-WCCI-2022和SG-126算例集上验证了所提算法的有效性.

本文引用格式

赵莹莹 , 仇越 , 朱天晨 , 李凡 , 苏运 , 邰振赢 , 孙庆赟 , 凡航 . 基于分层强化学习的新型电力系统在线稳态调度[J]. 上海交通大学学报, 2025 , 59(3) : 400 -412 . DOI: 10.16183/j.cnki.jsjtu.2023.344

Abstract

With the construction of new power systems, the stochasticity of high-proportion renewable energy significantly increases the uncertainty in the operation of the power grid, posing severe challenges to its safe, stable, and economically efficient operation. Data-driven artificial intelligence methods, such as deep reinforcement learning, are becoming increasingly important for regulating and assisting decision-making in the power grid in the new power system. However, current online scheduling algorithms based on deep reinforcement learning still face challenges in modeling the high-dimensional decision space and optimizing scheduling strategies, resulting in low model search efficiency and slow convergence. Therefore, a novel online steady-state scheduling method is proposed for the new power system based on hierarchical reinforcement learning, which reduces the decision space by adaptively selecting key nodes for adjustment. In addition, a state context-aware module based on gated recurrent units is introduced to model the high-dimensional environmental state, and a model with the optimization objectives of comprehensive operating costs, energy consumption, and over-limit conditions is constructed considering various operational constraints. The effectiveness of the proposed algorithm is thoroughly validated through experiments on three standard test cases, including IEEE-118, L2RPN-WCCI-2022, and SG-126.

参考文献

[1] 王继业. 人工智能赋能源网荷储协同互动的应用及展望[J]. 中国电机工程学报, 2022, 42(21): 7667-7681.
  WANG Jiye. Application and prospect of source-grid-load-storage coordination enabled by artificial intelligence[J]. Proceedings of the CSEE, 2022, 42(21): 7667-7681.
[2] 叶志亮, 黎灿兵, 张勇军, 等. 含高比例气象敏感可再生能源电网日前调度时间颗粒度优化[J]. 上海交通大学学报, 2023, 57(7): 781-790.
  YE Zhiliang, LI Canbing, ZHANG Yongjun, et al. Optimization of day-ahead dispatch time resolution in power system with a high proportion of climate-sensitive renewable energy sources[J]. Journal of Shanghai Jiao Tong University, 2023, 57(7): 781-790.
[3] RIFFONNEAU Y, BACHA S, BARRUEL F, et al. Optimal power flow management for grid connected PV systems with batteries[J]. IEEE Transactions on Sustainable Energy, 2011, 2(3): 309-320.
[4] AN L N, QUOC-TUAN T. Optimal energy management for grid connected microgrid by using dynamic programming method[C]//2015 IEEE Power & Energy Society General Meeting. Denver, USA: IEEE, 2015: 1-5.
[5] 李鹏, 王加浩, 黎灿兵, 等. 计及源荷不确定性与设备变工况特性的园区综合能源系统协同优化运行方法[J]. 中国电机工程学报, 2023, 43(20): 7802-7811.
  LI Peng, WANG Jiahao, LI Canbing, et al. Collaborative optimal scheduling of the community integrated energy system considering source-load uncertainty and equipment off-design performance[J]. Proceedings of the CSEE, 2023, 43(20): 7802-7811.
[6] GUO Y F, WU Q W, GAO H L, et al. Double-time-scale coordinated voltage control in active distribution networks based on MPC[J]. IEEE Transactions on Sustainable Energy, 2020, 11(1): 294-303.
[7] 陈雨婷, 赵毅, 吴俊达, 等. 考虑碳排放指标的配电网经济调度方法[J]. 上海交通大学学报, 2023, 57(4): 442-451.
  CHEN Yuting, ZHAO Yi, WU Junda, et al. Economic dispatch method of distribution network considering carbon emission index[J]. Journal of Shanghai Jiao Tong University, 2023, 57(4): 442-451.
[8] 戚艳, 尚学军, 聂靖宇, 等. 基于改进多目标灰狼算法的冷热电联供型微电网运行优化[J]. 电测与仪表, 2022, 59(6): 12-19.
  QI Yan, SHANG Xuejun, NIE Jingyu, et al. Optimization of CCHP micro-grid operation based on improved multi-objective grey wolf algorithm[J]. Electrical Measurement & Instrumentation, 2022, 59(6): 12-19.
[9] 刘新苗, 李卓环, 曾凯文, 等. 基于集群负荷预测的主动配电网多目标优化调度[J]. 电测与仪表, 2021, 58(5): 98-104.
  LIU Xinmiao, LI Zhuohuan, ZENG Kaiwen, et al. Multi-objective optimal dispatching of active distribution network based on cluster load prediction[J]. Electrical Measurement & Instrumentation, 2021, 58(5): 98-104.
[10] HIJJO M, FELGNER F, FREY G. PV-Battery-Diesel microgrid layout design based on stochastic optimization[C]//2017 6th International Conference on Clean Electrical Power. Santa Margherita Ligure, Italy: IEEE, 2017: 30-35.
[11] 潘险险, 陈霆威, 许志恒, 等. 适应多场景的微电网一体化柔性规划方法[J]. 上海交通大学学报, 2022, 56(12): 1598-1607.
  PAN Xianxian, CHEN Tingwei, XU Zhiheng, et al. A multi-scenario integrated flexible planning method for microgrid[J]. Journal of Shanghai Jiao Tong University, 2022, 56(12): 1598-1607.
[12] 符杨, 丁枳尹, 米阳. 计及储能调节的时滞互联电力系统频率控制[J]. 上海交通大学学报, 2022, 56(9): 1128-1138.
  FU Yang, DING Zhiyin, MI Yang. Frequency control strategy for interconnected power systems with time delay considering optimal energy storage regulation[J]. Journal of Shanghai Jiao Tong University, 2022, 56(9): 1128-1138.
[13] 李珂, 邰能灵, 张沈习. 基于改进粒子群算法的配电网综合运行优化[J]. 上海交通大学学报, 2017, 51(8): 897-902.
  LI Ke, TAI Nengling, ZHANG Shenxi. Comprehensive optimal dispatch of distribution network based on improved particle swarm optimization algorithm[J]. Journal of Shanghai Jiao Tong University, 2017, 51(8): 897-902.
[14] BADAWY M O, SOZER Y. Power flow management of a grid tied PV-battery system for electric vehicles charging[J]. IEEE Transactions on Industry Applications, 2017, 53(2): 1347-1357.
[15] ERICK A O, FOLLY K A. Reinforcement learning approaches to power management in grid-tied microgrids: A review[C]//2020 Clemson University Power Systems Conference. Clemson, USA: IEEE, 2020: 1-6.
[16] JI Y, WANG J H, XU J C, et al. Real-time energy management of a microgrid using deep reinforcement learning[J]. Energies, 2019, 12(12): 2291.
[17] 余涛, 刘靖, 胡细兵. 基于分布式多步回溯Q(λ)学习的复杂电网最优潮流算法[J]. 电工技术学报, 2012, 27(4): 185-192.
  YU Tao, LIU Jing, HU Xibing. Optimal power flow for complex power grid using distributed multi-step backtrack Q(λ) learning[J]. Transactions of China Electrotechnical Society, 2012, 27(4): 185-192.
[18] WEI Y F, ZHANG Z Q, YU F R, et al. Power allocation in HetNets with hybrid energy supply using actor-critic reinforcement learning[C]//GLOBECOM 2017-2017 IEEE Global Communications Conference. Singapore: IEEE, 2017: 1-5.
[19] 朱介北, 徐思旸, 李炳森, 等. 基于电网专家策略模仿学习的新型电力系统实时调度[J]. 电网技术, 2023, 47(2): 517-530.
  ZHU Jiebei, XU Siyang, LI Bingsen, et al. Real-time security dispatch of modern power system based on grid expert strategy imitation learning[J]. Power System Technology, 2023, 47(2): 517-530.
[20] HU J X, YE Y J, TANG Y, et al. Towards risk-aware real-time security constrained economic dispatch: A tailored deep reinforcement learning approach[J]. IEEE Transactions on Power Systems, 2024, 39(2): 3972-3986.
[21] CUI H, YE Y J, HU J X, et al. Online preventive control for transmission overload relief using safe reinforcement learning with enhanced spatial-temporal awareness[J]. IEEE Transactions on Power Systems, 2024, 39(1): 517-532.
[22] 俞发强, 张名捷, 程语, 等. 需求响应下的并网型风-光-沼微能源网优化配置[J]. 上海交通大学学报, 2023, 57(1): 10-16.
  YU Faqiang, ZHANG Mingjie, CHENG Yu, et al. Optimal sizing of grid-connected wind-solar-biogas integrated energy system considering demand response[J]. Journal of Shanghai Jiao Tong University, 2023, 57(1): 10-16.
[23] ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: A brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38.
[24] PATERIA S, SUBAGDJA B, TAN A H, et al. Hierarchical reinforcement learning[J]. ACM Computing Surveys, 2022, 54(5): 1-35.
[25] YOON D, HONG S, LEE B J, et al. Winning the l2RPN challenge: Power grid management via semi-markov afterstate actor-critic[C]//The Ninth International Conference on Learning Representations. Vienna, Austria: ICLR, 2021: 1-18.
[26] KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks[DB/OL]. (2017-02-22)[2023-07-22]. https://arxiv.org/abs/1609.02907.pdf.
[27] WU L Z, KONG C, HAO X H, et al. A short-term load forecasting method based on GRU-CNN hybrid neural network model[J]. Mathematical Problems in Engineering, 2020, 2020: 1428104.
[28] LAN T, DUAN J J, ZHANG B, et al. AI-based autonomous line flow control via topology adjustment for maximizing time-series ATCs[C]//2020 IEEE Power & Energy Society General Meeting. Montreal, Canada: IEEE, 2020: 1-5.
[29] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. (2015-09-09)[2023-07-22]. http://arxiv.org/abs/1509.02971v6.
[30] SERRé G, BOGUSLAWSKI E, DONNOT B, et al. Reinforcement learning for Energies of the future and carbon neutrality: A challenge design[DB/OL]. (2022-07-21) [2023-07-22]. http://arxiv.org/abs/2207.10330v1.
[31] DORFER M, FUXJ?GER A R, KOZáK K, et al. Power grid congestion management via topology optimization with AlphaZero[DB/OL]. (2022-11-10)[2023-07-22]. https://arxiv.org/abs/2211.05612.pdf.
[32] 季颖, 王建辉. 基于深度强化学习的微电网在线优化调度[J]. 控制与决策, 2022, 37(7): 1675-1684.
  JI Ying, WANG Jianhui. Online optimal scheduling of a microgrid based on deep reinforcement learning[J]. Control & Decision, 2022, 37(7): 1675-1684.
[33] 王甜婧, 汤涌, 郭强, 等. 基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法[J]. 中国电机工程学报, 2020, 40(8): 2396-2405.
  WANG Tianjing, TANG Yong, GUO Qiang, et al. Automatic adjustment method of power flow calculation convergence for large-scale power grid based on knowledge experience and deep reinforcement learning[J]. Proceedings of the CSEE, 2020, 40(8): 2396-2405.
[34] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. (2017-07-20)[2023-07-22]. http://arxiv.org/abs/1707.06347v2.
[35] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[DB/OL]. (2018-01-04) [2023-07-22]. http://arxiv.org/abs/1801.01290v2.
[36] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. (2018-02-26)[2023-07-22]. http://arxiv.org/abs/1802.09477v3.
文章导航

/