The only constant in the online social behavior is forever changing user intent and preferences.These changes could be inspired by myriad of factors but still have an overall trend e.g.todays popular news will be stale tomorrow etc.Such patterns are especially noticeable in viral trends where an immediate gain in popularity is followed by gradual lost of holistic interest.In this study we focus on design of a recommendation system which accounts for this non-stationary behavior of declining popularity.Contextual multi-armed bandit (contextual MAB) is a popular framework for learning user behavior and personalized recommendations based on the past behavior.Fundamentally MAB solves the exploitationexploration dilemma which aims at minimal guided experimentation required to gain certain level of confidence in its recommendation.Traditionally contextual MABs (e.g.LinUCB) have been used to model stationary user behavior that is not appropriate for the target environment where LinUCB can accumulate linear regret.Here we extend this LinUCB type algorithm to model a decaying environment.We present three algorithms with variable levels of specificity in the assumptions they make about the non-stationary environment.We show by simulation the e↵ectiveness of our methods which illustrates the usefulness of modeling meta-trends in user behavior.
Saeed GhoorchianEvgenii KortukovSetareh Maghsudi
Omar BesbesYonatan GurAssaf Zeevi
Afef Ben Hadj Alaya-FekiÉric MoulinesAlain LeCornec
Robin AllesiardoRaphaël FéraudOdalric-Ambrym Maillard
Xiaofang ZhangQian ZhouTieke HeBin Liang