JOURNAL ARTICLE

Multiagent Twin Delayed Deep Deterministic Policy Gradient Approach for Voltage Control of Distribution System

Abstract

Modern power distribution systems face significant challenges, including voltage violations and active power losses, due to the high penetration of renewable energy sources (RESs). Conventional voltage regulation devices are slow and constrained by operational limitations while existing Volt/VAR Control (VVC) techniques for reactive power compensation primarily rely on model-based optimization approaches. In contrast, model-free deep reinforcement learning (DRL) methods, such as Deep Deterministic Policy Gradient (DDPG), can adapt to changing grid conditions. However, DDPG suffers from Q-value overestimation and unstable learning issues, leading to suboptimal control policies. To address these challenges, a Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MA-TD3) technique is proposed in this paper to optimize the setpoints of modern voltage control devices. By leveraging twin critics with delayed policy updates, MA-TD3 enhances learning stability and mitigates overestimation bias. The distribution network is partitioned into sub-areas, with each sub-area formulated as a Markov game and solved cooperatively using MA-TD3. The proposed approach is validated on a modified IEEE 33-bus system, demonstrating superior performance over existing DRL methods in minimizing voltage violations and active power losses.

Keywords:
© 2026 ScienceGate Book Chapters — All rights reserved.