上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (S2): 7-14.doi: 10.16183/j.cnki.jsjtu.2021.S2.002

• • 上一篇    下一篇

基于深度强化学习的电网拓扑优化及潮流控制

周毅, 周良才(), 丁佳立, 高佳宁   

  1. 国家电网有限公司华东分部,上海 200120
  • 收稿日期:2021-10-28 出版日期:2021-12-28 发布日期:2022-01-24
  • 通讯作者: 周良才 E-mail:liangcaizhou@163.com
  • 作者简介:周 毅(1982-),男,上海市人,高级工程师,主要从事电网调度、电力系统自动化研究.

Power Network Topology Optimization and Power Flow Control Based on Deep Reinforcement Learning

ZHOU Yi, ZHOU Liangcai(), DING Jiali, GAO Jianing   

  1. East Branch of State Grid Corporation of China, Shanghai 200120, China
  • Received:2021-10-28 Online:2021-12-28 Published:2022-01-24
  • Contact: ZHOU Liangcai E-mail:liangcaizhou@163.com

摘要:

在追求碳中和的过程中,电源侧和负荷侧的巨大变化对电网运行和调度人员提出了新的要求和挑战.实时电网拓扑优化控制是一种成本低且有效的系统级缓解措施,然而,除了最简单的线路切换,由于电网网络拓扑优化问题的组合和非线性特性,常规的优化方法难以在短时间内完成求解.提出了一种新的基于人工智能的方法,通过考虑各种实际约束的电网拓扑优化控制来最大化系统可用传输容量.首先,利用模仿学习为智能体提供良好的初始策略.然后,通过深度强化学习引导智能体进行探索和学习,显著提升智能体的训练效率.最后,设计了一种预警机制来帮助智能体在长时间的运行决策过程中找到良好的拓扑控制策略,协助智能体找到最佳动作时间,从而有效地提高了方法的容错性和鲁棒性.该方法在IEEE 14节点开源数据集上进行了测试,测试结果验证了提出方法的有效性.

关键词: 拓扑优化控制, 可用传输容量, 模仿学习, 深度强化学习

Abstract:

In the pursuit of carbon neutrality, huge changes on the power supply side and the load side have brought forward new requirements and challenges for grid operation and dispatchers. A low-cost and effective measure is real-time power grid network topology optimization and control (NTOC). However, except for the simplest action of line switching, the combinatorial and non-linear nature of the NTOC problem has made existing approaches infeasible for grids of reasonable scales. This paper proposes a novel artificial intelligence (AI) based approach for maximizing available transfer capabilities (ATCs) via network topology control considering various practical constraints and uncertainties. First, imitation learning is utilized to provide a good initial policy for the AI agent. Then, the agent is trained through deep reinforcement learning with a novel guided exploration technique, which significantly improves the training efficiency. Finally, an early warning mechanism is designed to help the agent identify a proper action time, which effectively improves the fault tolerance and robustness of the method. The effectiveness of the proposed approach is tested by using open-sourced data of the IEEE 14-note system.

Key words: topology optimization and control, available transfer capability, imitation learning, deep reinforcement learning

中图分类号: