JOURNAL ARTICLE

Federated deep reinforcement learning-based urban traffic signal optimal control

M. LiXiaolong PanChuhui LiuZirui Li

Year: 2025 Journal:   Scientific Reports Vol: 15 (1)Pages: 11724-11724   Publisher: Nature Portfolio

Abstract

This paper proposes a cross-domain intelligent traffic signal control method based on federated Proximal-Policy Optimization (PPO) for distributed joint training of agents across domains for typical intersections, aiming at solving the problems of slow learning speed and poor model generalization when deep reinforcement learning (RL) is applied to cross-domain multi-intersection traffic signal optimization control. The proposed method improves the model generalization ability of different local models during global cross-region distributed joint training under the premise of ensuring information security and data privacy, solves the problem of non-independent and homogeneous distribution of environmental data faced by different agents in real intersection scenarios, and significantly accelerates the convergence speed of the model training phase. By reasonably designing the state, action and reward functions and determining the optimal values of several key parameters in the federated collaboration mechanism, the RL model could ensure high learning efficiency and fast convergence in the face of the gradual increase of road network size and the exponential increase of state and action space with the number of intersections. In addition, the new state interaction method and the reward function allow the agents to collaborate with each other, which greatly improves the information interaction efficiency between the federated learning local agents and the central coordinator, and improves the access efficiency of the road network and reduces the amount of communication data transmitted. Finally, through experimental comparisons, the proposed method can significantly reduce the average vehicle waiting time by up to 27.34% compared with the existing fixed timing method, and under the same convergence height, the convergence speed is up to 47.69% faster compared with the individual PPO trained in a single local environment, and up to 45.35% faster than the aggregated PPO trained jointly using all local data. The proposed method effectively optimizes intersection access efficiency with excellent robustness under various traffic flow settings.

Keywords:
Reinforcement learning Computer science Traffic signal Control (management) SIGNAL (programming language) Artificial intelligence Reinforcement Real-time computing Engineering

Metrics

13
Cited By
35.18
FWCI (Field Weighted Citation Impact)
42
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Traffic Prediction and Management Techniques
Physical Sciences →  Engineering →  Building and Construction
Traffic control and management
Physical Sciences →  Engineering →  Control and Systems Engineering
Vehicular Ad Hoc Networks (VANETs)
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.