JOURNAL ARTICLE

Practical Efficiency of Asynchronous Stochastic Gradient Descent

Abstract

Stochastic gradient descent (SGD) and its distributed variants are essential to leverage modern computing resources for large-scale machine learning tasks. ASGD [1] is one of the most popular asynchronous distributed variant of SGD. Recent mathematical analyses have shown that with certain assumptions on the learning task (and ignoring communication cost), ASGD exhibits linear speed-up asymptotically. However, as practically observed, ASGD does not lead linear speed-up as we increase the number of learners. Motivated by this, we investigate finite time convergence properties of ASGD. We observe that the learning rate used by mathematical analyses to guarantee linear speed-up can be very small (and practically sub-optimal with respect to convergence speed) as opposed to practically chosen learning rates (for quick convergence) which exhibit sub-linear speed-up. We show that such an observation can in fact be supported by mathematical analysis, i.e., in the finite time regime, better convergence rate guarantees can be proven for ASGD with small number of learners, thus indicating lack of linear speed up as we increase the number of learners. Thus we conclude that even with ignoring communication cost, there is an inherent inefficiency in ASGD with respect to increasing the number of learners.

Keywords:
Asynchronous communication Computer science Leverage (statistics) Convergence (economics) Rate of convergence Speedup Mathematical optimization Stochastic gradient descent Artificial intelligence Mathematics Parallel computing Channel (broadcasting)

Metrics

4
Cited By
1.13
FWCI (Field Weighted Citation Impact)
37
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Stochastic Gradient Optimization Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Privacy-Preserving Technologies in Data
Physical Sciences →  Computer Science →  Artificial Intelligence
Sparse and Compressive Sensing Techniques
Physical Sciences →  Engineering →  Computational Mechanics

Related Documents

JOURNAL ARTICLE

Practical efficiency of asynchronous stochastic gradient descent

Onkar BhardwajGuojing Cong

Journal:   IEEE International Conference on High Performance Computing, Data, and Analytics Year: 2016 Pages: 56-62
JOURNAL ARTICLE

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Guanghui LanYi Zhou

Journal:   IEEE Journal on Selected Areas in Information Theory Year: 2021 Vol: 2 (2)Pages: 802-811
JOURNAL ARTICLE

Asynchronous Stochastic Gradient Descent Over Decentralized Datasets

Yubo DuKeyou You

Journal:   IEEE Transactions on Control of Network Systems Year: 2021 Vol: 8 (3)Pages: 1212-1224
© 2026 ScienceGate Book Chapters — All rights reserved.