上海交通大学学报 ›› 2024, Vol. 58 ›› Issue (5): 682-692.doi: 10.16183/j.cnki.jsjtu.2022.358

• 新型电力系统与综合能源 • 上一篇    下一篇

基于安全深度强化学习的电网有功频率协同优化控制

周毅1, 周良才1(), 史迪2, 赵小英2, 闪鑫3   

  1. 1.国家电网有限公司华东分部,上海 200002
    2. AINERGY, 美国 圣塔克拉拉 95051
    3.国电南瑞科技股份有限公司,南京 210024
  • 收稿日期:2022-09-13 修回日期:2023-02-15 接受日期:2023-02-24 出版日期:2024-05-28 发布日期:2024-06-17
  • 通讯作者: 周良才,高级工程师;E-mail:liangcaizhou@163.com.
  • 作者简介:周 毅(1982-),高级工程师,主要从事电网调度、电力系统自动化研究.
  • 基金资助:
    国家电网有限公司华东分部科技项目(SGHD0000DKJS2100235)

Coordinated Active Power-Frequency Control Based on Safe Deep Reinforcement Learning

ZHOU Yi1, ZHOU Liangcai1(), SHI Di2, ZHAO Xiaoying2, SHAN Xin3   

  1. 1. State Grid East China Branch, Shanghai 200002, China
    2. AINERGY, Santa Clara 95051, USA
    3. NARI Technology Development Co., Ltd., Nanjing 210024, China
  • Received:2022-09-13 Revised:2023-02-15 Accepted:2023-02-24 Online:2024-05-28 Published:2024-06-17

摘要:

可再生能源占比不断增加给互联电网频率控制带来严峻考验.由于常规的自动发电控制(AGC)策略没有考虑电网潮流安全约束,所以传统方法根据专家知识和经验进行尝试性发电机功率调整,需耗费较多时间;基于最优电力潮流的互联电网AGC优化模型由于非凸性和大规模性,求解时间较长且存在收敛性问题.鉴于常规深度强化学习具有“离线训练、在线端对端形成策略”的优点,但在动作探索过程中无法保证系统安全性,提出一种基于安全深度强化学习的电网有功频率协同优化控制方法.首先,将电网频率控制建模为约束马尔可夫决策过程,对决策过程添加相关安全约束进行智能体设计;然后,基于华东电网实际系统算例对智能体进行训练和性能提升;最后,对比智能体决策与常规AGC策略效果.结果表明:所提方法在多种运行方式下可快速生成有功频率控制策略,且保证系统频率恢复过程中电网的安全性,可辅助调度员在线决策.

关键词: 有功频率协同控制, 人工智能, 深度强化学习, 约束马尔可夫决策过程, 智能体

Abstract:

The continuous increase in renewables penetration poses a severe challenge to the frequency control of interconnected power grid. Since the conventional automatic generation control (AGC) strategy does not consider the power flow constraints of the network, the traditional approach is to make tentative generator power adjustments based on expert knowledge and experience, which is time consuming. The optimal power flow-based AGC optimization model has a long solution time and convergence issues due to its non-convexity and large size. Deep reinforcement learning has the advantage of “offline training and online end-to-end strategy formation”, which yet cannot ensure the security of artificial intelligence (AI) in power grid applications. A coordinated optimal control method is proposed for active power and frequency control based on safe deep reinforcement learning. First, the method models the frequency control problem as a constrained Markov decision process, and an agent is designed by considering various safety constraints. Then, the agent is trained using the example of East China Power Grid through continuous interactions with the grid. Finally, the effect of the agent and the conventional AGC strategy is compared. The results show that the proposed approach can quickly generate control strategies under various operating conditions, and can assist dispatchers to make decisions online.

Key words: coordinated power and frequency control, artificial intelligence (AI), safe deep reinforcement learning, constrained Markov decision process, agent

中图分类号: