As the emergence of small cell densification and cache-enabled smart devices, mobile edge caching is regarded as a promising tool to relieve traffic burden of core network and reduce end-to-end delay. To fully utilize the limited caching capacity, cooperative caching has been proposed to further improve user experience by exploiting caching diversity. Under such paradigm, popular contents are prefetched and stored in small base stations (SBSs) or user devices. However, the popularity of a certain content may change over time due to human factors. In this paper, we study the cooperative content caching problem from a reinforcement learning perspective. We investigate a delay minimization problem by jointly considering the spatiotemporal variation of content variation, the cost of content sharing between user devices, and the cost of cooperative caching among BSs. To address this problem, we propose a two-stage multi-armed bandit learning based online cooperative (MAB-LOC) algorithm. In the first stage, we design a MAB based algorithm to estimate the content popularity. In the second stage, we design a semidefinite relaxation based approach to obtain the caching strategy. Through simulation results, we show that the performance of the proposed algorithm is competitive in terms of caching-hit probability and end-to-end delay.
Yunpeng MaWeijing QiPeng LinMengru WuLei Guo
Wen WuNing ZhangNan ChengYujie TangKhalid AldubaikhyXuemin Shen
Yifan ZhouZhifeng ZhaoRongpeng LiHonggang ZhangYves Louët