Adaptive Bit Rate (ABR) assignment plays a crucial role for ensuring satisfactory quality of experience (QoE) in video streaming applications. Recently the authors of [1] proposed to use reinforcement learning (RL) based asynchronous advantage actor-critic ( $A3C$ ), an on-policy method, Pensieve, to improve ABR algorithms. It has shown to achieve a higher QoE as compared to other traditional ABR methods. However, Pensieve is sample inefficient and frail to different random seeds and hyperparameters. In this paper, we present soft actor-critic based deep reinforcement learning for adaptive bitrate streaming (SAC-ABR), an off-policy method, which improves the QoE as compared to other existing state-of-the-art ABR algorithms under a wide variety of network conditions. Based on the maximum entropy RL framework, SAC-ABR aims to maximize entropy while maximizing the expected rewards, hence achieving a better exploration-exploitation tradeoff as compared to on-policy ABR methods. We present the overall design together with the training and testing results of SAC-ABR, and evaluate its performance as compared to other state-of-the-art ABR algorithms. Our results show that SAC-ABR provides up to 27.42% higher average QoE as compared to Pensieve and much higher QoE when compared to other traditional fixed-rule based ABR algorithms.
Mandan NareshParesh SaxenaManik Gupta
Lalitha ChavaliTanay GuptaParesh Saxena
Fatima M. EzzeddineOmran AyoubDavide AndreolettiSilvia Giordano
Feng DingGuanfeng MaZhikui ChenJing GaoPeng Li
Ning RaoHua XuBalin SongYunhao Shi