Ruyue YangDing WangJunfei Qiao
To solve the optimal control problem of the nitrate-nitrogen concentration in the wastewater treatment plant (WWTP), a model-free off-policy iterative adaptive dynamic programming algorithm using online data is proposed. Under the actor-critic structure, the developed algorithm approximates the optimal Q-function by minimizing the temporal difference and improves the control law through the policy gradient method. Neural networks are utilized in the proposed scheme. Besides, the experience replay buffer is involved in the off-policy iteration of neural networks. Finally, simulation examples for a nonlinear system and WWTP are presented to verify the effectiveness of the proposed method.
Yongwei ZhangBo ZhaoDerong Liu
Hao ZhangYan LiZhuping WangYi DingHuaicheng Yan
Xin TongDazhong MaZhanshan WangZhongyang MingXiangpeng Xie