上海交通大学学报 ›› 2025, Vol. 59 ›› Issue (3): 400-412.doi: 10.16183/j.cnki.jsjtu.2023.344

• 新型电力系统与综合能源 • 上一篇    下一篇

基于分层强化学习的新型电力系统在线稳态调度

赵莹莹1,2, 仇越3, 朱天晨3(), 李凡1,2, 苏运1,2, 邰振赢3, 孙庆赟3, 凡航4   

  1. 1.国网上海市电力公司,上海 200125
    2.华东电力试验研究院有限公司,上海 200437
    3.北京航空航天大学,北京 100191
    4.华北电力大学 经济与管理学院,北京 100096
  • 收稿日期:2023-07-24 修回日期:2023-09-26 接受日期:2023-11-22 出版日期:2025-03-28 发布日期:2025-04-02
  • 通讯作者: 朱天晨,博士;E-mail:catezi@buaa.edu.cn.
  • 作者简介:赵莹莹(1991—),硕士,专业工程师,从事电力大数据和人工智能技术应用工作.
  • 基金资助:
    国网上海市电力公司科技项目(B3094022000D);上海电力人工智能工程技术研究中心研究项目(19DZ2252800)

Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning

ZHAO Yingying1,2, QIU Yue3, ZHU Tianchen3(), LI Fan1,2, SU Yun1,2, TAI Zhenying3, SUN Qingyun3, FAN Hang4   

  1. 1. State Grid Shanghai Municipal Electric Power Company, Shanghai 200125, China
    2. East China Electric Power Test and Research Institute Co., Ltd., Shanghai 200437, China
    3. Beihang University, Beijing 100191, China
    4. School of Economics and Management, North China Electric Power University, Beijing 100096, China
  • Received:2023-07-24 Revised:2023-09-26 Accepted:2023-11-22 Online:2025-03-28 Published:2025-04-02

摘要:

随着新型电力系统的建设,高比例可再生能源的随机性导致电网运行方式的不确定性大幅增加,给电网的安全稳定经济运行带来严峻挑战.采用深度强化学习方法等数据驱动的人工智能方法对电网进行调控并进行辅助决策在新型电力系统中具有重要意义,但当前基于深度强化学习的在线调度算法仍然面临高维决策空间难建模、调度策略难优化的问题,使得模型搜索效率较低、收敛较慢.因此,提出一种基于分层强化学习的新型电力系统在线稳态调度方法,通过自适应选取关键节点调节以降低决策空间.在此基础上进一步引入基于门控循环单元的状态上下文感知模块建模高维环境状态,综合运行成本、能源消纳以及越限情况为优化目标构建模型,并考虑各种运行约束.在IEEE-118、L2RPN-WCCI-2022和SG-126算例集上验证了所提算法的有效性.

关键词: 电网运行调度, 强化学习, 分层决策, 状态表征

Abstract:

With the construction of new power systems, the stochasticity of high-proportion renewable energy significantly increases the uncertainty in the operation of the power grid, posing severe challenges to its safe, stable, and economically efficient operation. Data-driven artificial intelligence methods, such as deep reinforcement learning, are becoming increasingly important for regulating and assisting decision-making in the power grid in the new power system. However, current online scheduling algorithms based on deep reinforcement learning still face challenges in modeling the high-dimensional decision space and optimizing scheduling strategies, resulting in low model search efficiency and slow convergence. Therefore, a novel online steady-state scheduling method is proposed for the new power system based on hierarchical reinforcement learning, which reduces the decision space by adaptively selecting key nodes for adjustment. In addition, a state context-aware module based on gated recurrent units is introduced to model the high-dimensional environmental state, and a model with the optimization objectives of comprehensive operating costs, energy consumption, and over-limit conditions is constructed considering various operational constraints. The effectiveness of the proposed algorithm is thoroughly validated through experiments on three standard test cases, including IEEE-118, L2RPN-WCCI-2022, and SG-126.

Key words: operation scheduling of power grid, reinforcement learning, hierarchical decision making, state representation

中图分类号: