JOURNAL ARTICLE

Optimizing contextual bandit hyperparameters: A dynamic transfer learning-based framework

Farshad SeifiSeyed Taghi Akhavan Niaki

Year: 2024 Journal:   International Journal of Industrial Engineering Computations Vol: 15 (4)Pages: 951-964   Publisher: Growing Science

Abstract

The stochastic contextual bandit problem, recognized for its effectiveness in navigating the classic exploration-exploitation dilemma through ongoing player-environment interactions, has found broad applications across various industries. This utility largely stems from the algorithms’ ability to accurately forecast reward functions and maintain an optimal balance between exploration and exploitation, contingent upon the precise selection and calibration of hyperparameters. However, the inherently dynamic and real-time nature of bandit environments significantly complicates hyperparameter tuning, rendering traditional offline methods inadequate. While specialized methods have been developed to overcome these challenges, they often face three primary issues: difficulty in adaptively learning hyperparameters in ever-changing environments, inability to simultaneously optimize multiple hyperparameters for complex models, and inefficiencies in data utilization and knowledge transfer from analogous tasks. To tackle these hurdles, this paper introduces an innovative transfer learning-based approach designed to harness past task knowledge for accelerated optimization and dynamically optimize multiple hyperparameters, making it well-suited for fluctuating environments. The method employs a dual Gaussian meta-model strategy—one for transfer learning and the other for assessing hyperparameters’ performance within the current task —enabling it to leverage insights from previous tasks while quickly adapting to new environmental changes. Furthermore, the framework’s meta-model-centric architecture enables simultaneous optimization of multiple hyperparameters. Experimental evaluations demonstrate that this approach markedly outperforms competing methods in scenarios with perturbations and exhibits superior performance in 70% of stationary cases while matching performance in the remaining 30%. This superiority in performance, coupled with its computational efficiency on par with existing alternatives, positions it as a superior and practical solution for optimizing hyperparameters in contextual bandit settings.

Keywords:
Hyperparameter Transfer of learning Computer science Artificial intelligence Machine learning

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.15
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Air Quality Monitoring and Forecasting
Physical Sciences →  Environmental Science →  Environmental Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.