JOURNAL ARTICLE

Multi-agent reinforcement learning for traffic signal control

Abstract

Optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users amidst the rapid increase in the usage of vehicles. In this paper, we formulate the TSC problem as a discounted cost Markov decision process (MDP) and apply multi-agent reinforcement learning (MARL) algorithms to obtain dynamic TSC policies. We model each traffic signal junction as an independent agent. An agent decides the signal duration of its phases in a round-robin (RR) manner using multi-agent Q-learning with either ε-greedy or UCB [3] based exploration strategies. It updates its Q-factors based on the cost feedback signal received from its neighbouring agents. This feedback signal can be easily constructed and is shown to be effective in minimizing the average delay of the vehicles in the network. We show through simulations over VISSIM that our algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm [15] over two real road networks.

Keywords:
Reinforcement learning Markov decision process Computer science SIGNAL (programming language) VisSim Q-learning Markov process Signal timing Real-time computing Control theory (sociology) Mathematical optimization Control (management) Traffic signal Artificial intelligence Engineering Mathematics Microsimulation Statistics

Metrics

102
Cited By
5.05
FWCI (Field Weighted Citation Impact)
20
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Traffic control and management
Physical Sciences →  Engineering →  Control and Systems Engineering
Transportation Planning and Optimization
Social Sciences →  Social Sciences →  Transportation
Smart Parking Systems Research
Physical Sciences →  Engineering →  Building and Construction
© 2026 ScienceGate Book Chapters — All rights reserved.