Practical Efficiency of Asynchronous Stochastic Gradient Descent

Onkar Bhardwaj; Guojing Cong

doi:10.1109/mlhpc.2016.010

ScienceGate Book Chapters

JOURNAL ARTICLE

Practical Efficiency of Asynchronous Stochastic Gradient Descent

Onkar Bhardwaj Guojing Cong

Year: 2016 Pages: 56-62

DOI: 10.1109/mlhpc.2016.010

Get Full-Text PDF Get Analytical Report

Abstract

Stochastic gradient descent (SGD) and its distributed variants are essential to leverage modern computing resources for large-scale machine learning tasks. ASGD [1] is one of the most popular asynchronous distributed variant of SGD. Recent mathematical analyses have shown that with certain assumptions on the learning task (and ignoring communication cost), ASGD exhibits linear speed-up asymptotically. However, as practically observed, ASGD does not lead linear speed-up as we increase the number of learners. Motivated by this, we investigate finite time convergence properties of ASGD. We observe that the learning rate used by mathematical analyses to guarantee linear speed-up can be very small (and practically sub-optimal with respect to convergence speed) as opposed to practically chosen learning rates (for quick convergence) which exhibit sub-linear speed-up. We show that such an observation can in fact be supported by mathematical analysis, i.e., in the finite time regime, better convergence rate guarantees can be proven for ASGD with small number of learners, thus indicating lack of linear speed up as we increase the number of learners. Thus we conclude that even with ignoring communication cost, there is an inherent inefficiency in ASGD with respect to increasing the number of learners.

Keywords:

Asynchronous communication Computer science Leverage (statistics) Convergence (economics) Rate of convergence Speedup Mathematical optimization Stochastic gradient descent Artificial intelligence Mathematics Parallel computing Channel (broadcasting)

Metrics

Cited By

1.13

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Stochastic Gradient Optimization Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Privacy-Preserving Technologies in Data

Physical Sciences → Computer Science → Artificial Intelligence

Sparse and Compressive Sensing Techniques

Physical Sciences → Engineering → Computational Mechanics

Practical Efficiency of Asynchronous Stochastic Gradient Descent

Abstract

Metrics

Citation History

Topics

Related Documents

Practical efficiency of asynchronous stochastic gradient descent

Asynchronous parallel stochastic gradient descent

Communication Efficient Asynchronous Stochastic Gradient Descent

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Asynchronous Stochastic Gradient Descent Over Decentralized Datasets