Yongbo YuFuxun YuZirui XuDi WangMinjia ZhangAng LiShawn BrayChenchen LiuXiang Chen
Federated learning (FL) nowadays involves compound learning tasks as cognitive applications' complexity increases. For example, a self-driving system hosts multiple tasks simultaneously (e.g., detection, classification, etc.) and expects FL to retain life-long intelligence involvement. However, our analysis demonstrates that, when deploying compound FL models for multiple training tasks on a GPU, certain issues arise: (1) As different tasks' skewed data distributions and corresponding models cause highly imbalanced learning workloads, current GPU scheduling methods lack effective resource allocations; (2) Therefore, existing FL schemes, only focusing on heterogeneous data distribution but runtime computing, cannot practically achieve optimally synchronized federation. To address these issues, we propose a full-stack FL optimization scheme to address both intra-device GPU scheduling and inter-device FL coordination for multi-task training. Specifically, our works illustrate two key insights in this research domain: (1) Competitive resource sharing is beneficial for parallel model executions, and the proposed concept of "virtual resource" could effectively characterize and guide the practical per-task resource utilization and allocation. (2) FL could be further improved by taking architectural level coordination into consideration. Our experiments demonstrate that the FL throughput could be significantly escalated.
Yu, YongboYu, FuxunXu, ZiruiWang, DiZhang, MinjiaLi, AngLiu, ChenChenTian, ZhiChen, Xiang
Yongbo YuFuxun YuZirui XuDi WangMinjia ZhangAng LiChenchen LiuZhi TianXiang Chen
Yubo YangTao YangXiaofeng WuBo Hu
Yuhan AiQimei ChenYipeng LiangHao Jiang