As a new paradigm, edge caching is deemed an effective alternative by fetching contents at the network edge. However, designing an efficient caching mechanism is challenging. First, the content library is a dynamic set rather than a static set. Second, the content may be prevalent in different small base stations (SBSs), resulting in different rewards. Thus, the above reasons require each SBS could learn its caching decisions in a multi-SBSs network. Existing reinforcement learning algorithms either fail to consider the non-stationary environment or do not provide any performance guarantee. Thus, previous algorithms work well no longer. This work proposes a multi-agent multi-armed bandit caching framework, MAMAB-C, which navigates SBSs to cache contents in a distributed manner. Specifically, we formulate the multi-SBSs caching optimization problem as an online integer linear program (ILP) and convert it into a multi-agent multi-armed bandit (MAMAB) problem with resource constraints. MAMAB-C can realize the sub-linear metric property and significantly outperform multiple state-of-the-art algorithms.
Amit Kumar BhuyanHrishikesh DuttaSubir Biswas
Bochun WuTianyi ChenWei NiXin Wang
Wei JiangGang FengShuang QinYijing Liu