Optimizing contextual bandit hyperparameters: A dynamic transfer learning-based framework

Farshad Seifi; Seyed Taghi Akhavan Niaki

doi:10.5267/j.ijiec.2024.6.003

ScienceGate Book Chapters

JOURNAL ARTICLE

Optimizing contextual bandit hyperparameters: A dynamic transfer learning-based framework

Farshad Seifi Seyed Taghi Akhavan Niaki

Year: 2024 Journal: International Journal of Industrial Engineering Computations Vol: 15 (4)Pages: 951-964 Publisher: Growing Science

DOI: 10.5267/j.ijiec.2024.6.003

Get Full-Text PDF Get Analytical Report

Abstract

The stochastic contextual bandit problem, recognized for its effectiveness in navigating the classic exploration-exploitation dilemma through ongoing player-environment interactions, has found broad applications across various industries. This utility largely stems from the algorithms’ ability to accurately forecast reward functions and maintain an optimal balance between exploration and exploitation, contingent upon the precise selection and calibration of hyperparameters. However, the inherently dynamic and real-time nature of bandit environments significantly complicates hyperparameter tuning, rendering traditional offline methods inadequate. While specialized methods have been developed to overcome these challenges, they often face three primary issues: difficulty in adaptively learning hyperparameters in ever-changing environments, inability to simultaneously optimize multiple hyperparameters for complex models, and inefficiencies in data utilization and knowledge transfer from analogous tasks. To tackle these hurdles, this paper introduces an innovative transfer learning-based approach designed to harness past task knowledge for accelerated optimization and dynamically optimize multiple hyperparameters, making it well-suited for fluctuating environments. The method employs a dual Gaussian meta-model strategy—one for transfer learning and the other for assessing hyperparameters’ performance within the current task —enabling it to leverage insights from previous tasks while quickly adapting to new environmental changes. Furthermore, the framework’s meta-model-centric architecture enables simultaneous optimization of multiple hyperparameters. Experimental evaluations demonstrate that this approach markedly outperforms competing methods in scenarios with perturbations and exhibits superior performance in 70% of stationary cases while matching performance in the remaining 30%. This superiority in performance, coupled with its computational efficiency on par with existing alternatives, positions it as a superior and practical solution for optimizing hyperparameters in contextual bandit settings.

Keywords:

Hyperparameter Transfer of learning Computer science Artificial intelligence Machine learning

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.15

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Data Stream Mining Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Air Quality Monitoring and Forecasting

Physical Sciences → Environmental Science → Environmental Engineering

Optimizing contextual bandit hyperparameters: A dynamic transfer learning-based framework

Abstract

Metrics

Topics

Related Documents

BanditWare: A Contextual Bandit-based Framework for Hardware Prediction

Active Learning for Streaming Data in A Contextual Bandit Framework

Optimizing risk transfer in dynamic insurance networks: A graph-based reinforcement learning framework

Reinforcement Learning for Economically Optimized Churn Management: A Contextual Bandit Framework

Contextual Bandit Learning-Based Viewport Prediction for 360 Video