With the rapid development of the Internet of Things (IoT), many heterogeneous IoT devices provide atomic services that need to be integrated and combined to form complex services, meeting intricate user demands. In IoT scenarios, the dynamic nature of the environment often leads to fluctuations in the quality of service of atomic services or even their unavailability. Common heuristic composition optimization algorithms struggle to adapt to dynamic environments and require intricate design processes and parameter adjustments. This study introduces a deep reinforcement learning-based optimization algorithm, PD3QND, which incorporates fundamental DQN, noise networks, prioritized experience replay, double dueling architecture, and demonstration learning. Experiments show that, compared to heuristic algorithms and methods like DQN, our algorithm can adaptively balance exploitation and exploration when facing dynamic quality of service (QoS) changes in manufacturing IoT environments. It avoids the cold start problem and robustly and efficiently searches the solution space, demonstrating faster convergence speed and more robust adaptability.
Kan YiJin YangShuangling WangZhengtong ZhangXiao Ren
Huagang LiangXiaoqian WenYongkui LiuHaifeng ZhangZhang LiLihui Wang
Lamiaa BasyoniAiman ErbadAmr MohamedMohsen Guizani
Zhengyuan PangLifeng SunTianchi HuangZhi WangShiqiang Yang
Pegah AlizadehAomar OsmaniMohamed Essaid KhanoucheAbdelghani ChibaniYacine Amirat