GUO Zhenhua, YAN Ruidong, QIU Zhiyong, ZHAO Yaqian, LI Rengang
The stochastic gradient descent (SGD) algorithms have been applied to machine learning and deep learning due to their superior performance. However, SGD requires the stochastic gradient of a single sample to approximate the full gradient of all samples, introducing additional variance in each iteration. This makes the convergence curve of SGD oscillate or even diverge. Therefore, effectively reducing variance becomes a key challenge at present. To address the above challenge, a variance reduction optimization algorithm, DM-SRG (double mini-batch stochastic recursive gradient), based on mini-batch random sampling is proposed and applied to solving convex and non-convex optimization problems. The main feature of the algorithm including an inner and outer double loop structure is designed: the outer loop structure uses mini-batch random samples to calculate the gradient, approximating the full gradient and reducing the gradient calculation cost; the inner loop structure also uses mini-batch random samples to calculate the gradient and replace the single sample random gradient, improving convergence stability of the algorithm. In this paper, a sublinear convergence rate of DM-SRG algorithm is theoretically guaranteed for both non-convex and convex objective functions. Furthermore, a dynamic sample size adjustment strategy based on the performance evaluation model of computing unit is designed to improve the training efficiency. The effectiveness of the algorithm is evaluated via numerical simulation experiments on real datasets of varying sizes. Experimental results show that the loss function of the DM-SRG algorithm is reduced by 18.1%, and the average time of the algorithm is reduced by 8.22%.
Jeongho KimDongnam KoChohong MinByungjoon Lee
Shifeng XuYanzhu LiuAdams Wai-Kin Kong
Jérémie DecockJialin LiuOlivier Tetaud