A team of researchers from Chung Ang University in Korea led by Professor Keemin Sohn has proposed a meta-RL model for traffic-signal control. Specifically, the team developed an extended deep Q-network (EDQN)-incorporated context-based meta-RL model for traffic signal control.
Traditional traffic signal controllers are often not capable of handling traffic congestion. Existing systems rely on a theory- or rule-based controller in charge of altering the traffic lights based on traffic conditions. The objective is to reduce vehicle delay during normal traffic conditions and maximize vehicle throughput during congestion. However, traditional traffic signal controllers cannot fulfil such altering objectives, and a human controller can only manage a few intersections.
Reinforcement learning (RL) can potentially solve this problem, however, RL usually works in a stationary environment, and traffic environments are not stationary.
“Existing studies have devised meta-RL algorithms based on intersection geometry, traffic signal phases, or traffic conditions,” explains Sohn. “The present research deals with the non-stationary aspect of signal control according to the congestion levels. The meta-RL works autonomously in detecting traffic states, classifying traffic regimes, and assigning signal phases.”
The model works as follows. It determines the traffic regime – saturated or unsaturated – by utilizing a latent variable that indicates the overall environmental condition. Based on traffic flow, the model either maximizes throughput or minimizes delays, similar to a human controller. It does so by implementing traffic signal phases (action). As with intelligent learning agents, the action is controlled by the provision of a ‘reward’. Here, the reward function is set to be +1 or -1 corresponding to a better or worse performance in handling traffic relative to the previous interval, respectively. Further, the EDQN acts as a decoder to jointly control traffic signals for multiple intersections.
Following its theoretical development, the researchers trained and tested their meta-RL algorithm using Vissim v21.0, a commercial traffic simulator, to mimic real-world traffic conditions. Further, a transportation network in southwest Seoul consisting of 15 intersections was chosen as a real-world testbed. Following meta-training, the model could adapt to new tasks during meta-testing without adjusting its parameters.
The simulation experiments revealed that the proposed model could switch control tasks (via transitions) without any explicit traffic information. It could also differentiate between rewards according to the saturation level of traffic conditions. Further, the EDQN-based meta-RL model outperformed the existing algorithms for traffic signal control and could be extended to tasks with different transitions and rewards.
Nevertheless, the researchers pointed to the need for an even more precise algorithm to consider different saturation levels from intersection to intersection.
“Existing research has employed reinforcement learning for traffic signal control with a single fixed objective,” says Sohn. “In contrast, this work has devised a controller that can autonomously select the optimal target based on the latest traffic condition. The framework, if adopted by traffic signal control agencies, could yield travel benefits that have never been experienced before.”