The computational efficiency of the asynchronous stochastic gradient descent (ASGD) against its synchronous version has been well documented in recent works. Unfortunately, it usually works only for the situation that all workers retrieve data from a shared dataset. As data get larger and more distributed, new ideas are urgently needed to maintain the efficiency of ASGD for decentralized training. This article proposes a novel ASGD over decentralized datasets where each worker can only access its local privacy-preserved dataset. We first observe that due to the heterogeneity of decentralized datasets and/or workers, the ASGD will progress at wrong directions, leading to undesired solutions. To tackle this issue, we propose a decentralized asynchronous stochastic gradient descent (DASGD) method by weighting the SG via the importance sampling technique. We prove that the DASGD achieves a convergence rate of O(1/K \frac12 ) on nonconvex training problems under mild conditions. Numerical results also substantiate the performance of the proposed algorithm.
Amrit Singh BediHrusikesha PradhanKetan Rajawat
Yanwei ZhengLiangxu ZhangShuzhen ChenXiao ZhangZhipeng CaiXiuzhen Cheng
Huan ZhangCho‐Jui HsiehVenkatesh Akella