The growing penetration of distributed renewable energy resources necessitates more intelligent and adaptive energy management strategies. In this paper, we propose a novel imbalance-aware control framework for photovoltaic-battery storage systems (PV-BSS) participating in day-ahead electricity markets characterized by strict penalty mechanisms, such as those in Japan. The core of the framework is a Proximal Policy Optimization (PPO)-based deep reinforcement learning (DRL) agent, which is explicitly trained to minimize imbalance penalties by embedding forecast deviations into the reward function. To enhance operational feasibility under real-world constraints, the PPO agent is complemented by a Model Predictive Control (MPC) layer that refines actions in real time based on updated forecasts and system constraints. The proposed framework integrates probabilistic PV forecasting using Lower-Upper Bound Estimation (LUBE) and electricity price prediction via multi-layer perceptron (MLP) models within a unified control loop. Through extensive simulations using actual Japanese market data, the method demonstrates a 47% reduction in imbalance penalties compared to the rule-based strategy and a 26% reduction compared to the DRL model without imbalance awareness. These results highlight the proposed method’s potential for economically efficient and regulation-compliant scheduling in dynamic and penalty-intensive electricity markets.
Moritz ZebenholzerLukas KasperAlexander SchirrerRené Hofmann
Alaa SelimHuadong MoH. R. PotaDaoyi Dong