In recent years, coupon recommendations have become an essential strategy for e-commerce platforms to attract users and increase transaction volume. However, balancing exploration and exploitation to maximize user engagement remains a significant challenge. Traditional recommendation methods often fail to cope with dynamic user preferences and sparse feedback in real-world scenarios. To overcome these limitations, this study applies three typical Multi-Armed Bandit (MAB) algorithms — Greedy, Upper Confidence Bound (UCB), and Thompson Sampling (TS) — to the coupon recommendation task. A series of experiments was designed to comprehensively evaluate the performance of these algorithms from multiple perspectives, including the effectiveness of different coupon arms, the adaptability to diverse user groups, and the robustness in dynamic and changing environments. The experimental results demonstrate that TS consistently achieved the best performance across all scenarios, with average rewards reaching 0.85, 0.82, and 0.78 in the three experiments, respectively. UCB exhibited stable performance with rewards of 0.75, 0.70, and 0.68, while Greedy performed the worst with rewards of 0.65, 0.60, and 0.55. These findings verify the superior adaptability of TS in complex environments and confirm the feasibility and effectiveness of MAB algorithms in optimizing coupon recommendation systems.
Fuhua LinM. Ali Akber DewanM. Nguyen
Qing WangTao LiS. S. IyengarLarisa ShwartzGenady Ya. Grabarnik