The maintenance scheduling problem of windfarms is a recently arisen research topic, which contains uncertain factors introduced by weather conditions. However, most existing methods cannot generate maintenance schedules dynamically according to stochastic weather conditions. This paper formulates the maintenance scheduling problem as a Markov decision process (MDP). The Soft Actor-Critic (SAC) method is used to solve the MDP that has an extremely large state space. SAC is an off-policy deep reinforcement learning algorithm that considers entropy regularization during action selection. This mechanism accelerates the training process of the agent and prevents premature convergence to a local optimum solution. Numerical examples are used to verify the performance of SAC in maintenance scheduling. Result shows that the proposed method can obtain higher total production than the deep Q network and the genetic algorithm when the stochastic wind speed is considered.
Guilherme KoslovskiKleiton PereiraPaulo Roberto Albuquerque
Feng DingGuanfeng MaZhikui ChenJing GaoPeng Li
Ning RaoHua XuBalin SongYunhao Shi
Yongjian LuoChengxi LiuQiupin Lai
Chien‐Liang LiuChuan-Chin ChangChun-Jan Tseng