Jinlian ChenJun ZhangNan ZhaoYiyang PeiYing‐Chang LiangDusit Niyato
Federated Learning (FL) enables large-scale machine learning without uploading the private data of wireless devices. Due to the heterogeneity and limitation of the devices' resources, the FL accuracy and latency substantially depend on the device participation and training dataset size. In this letter, to strike a trade-off between the FL accuracy and FL latency, a joint device participation, dataset management and resource allocation (DPDMRA) optimization problem is investigated. To solve the non-convex optimization problem, a Markov decision process is formulated for the resource-limited wireless FL. Moreover, due to the high dimensional continuous action space, a multi-agent softmax deep double deterministic policy gradients (MASD3) method is employed to obtain the optimal DPDMRA strategies. The double actor networks and softmax operator are designed to alleviate the underestimation bias. Simulation results demonstrate that the proposed DRL method can obtain the global optimal policy without complete information in the dynamic environment. Compared with the other baseline schemes, the proposed MASD3 approach can achieve the larger system utility with the better convergence performance.
Wei MaoXingjian LuYuhui JiangHaikun Zheng
Wenqi ShiSheng ZhouZhisheng NiuMiao JiangLu Geng
Yueyue DaiHuijiong YangHuiran Yang
Xinyi XuGang FengShuang QinYi‐Jing LiuYao Sun
Changxiang WuYijing RenDaniel K. C. So