This paper presents a novel Unmanned Aerial Vehicle (UAV) path planning framework that integrates the UCB1-Tuned multi-armed bandit algorithm with precise numerical integration techniques (Runge-Kutta 4th order) to address navigation challenges in stochastic wind environments with obstacles. The proposed method discretizes the joint action space of wind direction and acceleration to enable efficient exploration and exploitation for adaptive control under uncertainty. The use of UCB1-Tuned balances reward means and variance to select optimal control actions without requiring Bayesian sampling, enhancing computational efficiency and stability. High-fidelity trajectory generation is ensured through accurate numerical integration respecting UAV dynamic constraints. Extensive simulations demonstrate that the UCB1-Tuned, combined with Runge-Kutta integration, achieves a success rate over 99% and reduces average energy consumption significantly compared to random baselines and Euler integration variants. The approach effectively learns favorable control policies by dynamically adapting to wind and obstacle conditions, ensuring safety and energy efficiency. This work offers a scalable decision-making framework that combines principled learning with model-based trajectory prediction, providing a promising direction for real-time autonomous UAV navigation in complex and uncertain environments.
Edouard FouchéJunpei KomiyamaKlemens Böhm