The IoT has become a paradigm shift and already has connected billions of devices in the healthcare, transportation, production, and smart cities sectors. Since this growth is exponential, a great challenge has been provision of resources (particularly its energy efficiency). IoT devices are described as having low power, computing power, and bandwidth. The non-uniform and extremely dynamic nature of the IoT environment cannot be practically addressed using the classical optimization models. It can be quite promising to use the reinforcement Learning (RL) to attain autonomous and adaptive decisions in the resources allocation based on the data reduction without energy consumption. The article shall include a literature review of reinforcement learning systems to effectively distribute the IoT resources in terms of energy consumption. It introduces the theoretical models of RL, Markov Decision Process, Q-learning and Deep Reinforcement Learning (DRL) and applies them to maximize the power consumption, bandwidth allocation and offloading of computations. The paper discusses such popular RL-based architecture as Q-learning to dynamical spectrum accessing, Deep Q-Network to task allocation, and actor-critic architecture to power harvesting. It further talks about hybrid solutions using RL that could be used to solve the privacy and scalability problem by generalizing to non-metric type of edge computing and federated learning. It is revealed that the RL-based approaches is way better than the time-honoured heuristics since it accommodates the dynamical requirements of the network and consumes lesser powers but does not improve the performance of the Quality of Service (QoS). Scalability, speed of conversion, interpretability and practical application, however, remain an issue. As mentioned in the paper, reinforcement learning has been suggested as a strong paradigm to establish sustainable IoT ecosystems and that future research should also consider lightweight, explainable, and privacy-preserving instantiations of RL models, which can be implemented in the resource-constrained IoT setting.
Qingtian WangBeining FengXu DuanJiaying ZongSiyu Chen
Rathindra Nath DuttaSasthi C. Ghosh
Suchitra N ShenoyGanesh BhatManoj H Gadiyar T