Cloud platforms often rely on reactive, threshold-based auto-scaling, which can lead to both over-provisioning (wasted cost) and under-provisioning (performance degradation) under dynamic workloads. We present a fully integrated framework that forecasts short-term resource demands using hybrid time-series models (LSTM neural networks + ARIMA) and drives proactive scaling decisions via a dual-stage optimizer combining Deep Q-Learning (DQN) and Genetic Algorithms (GA). Deployed on a local Kubernetes testbed, our solution achieves over 90 % forecasting accuracy (RMSE < 0.05), reduces operational cost by ~25 %, and improves average CPU utilization from 60 % to 85 %, while maintaining sub-200 ms scaling latencies. This hybrid approach also yields an estimated 15%energy savings by minimizing idle resources—demonstrating a practical path toward cost- and energy-efficient cloud resource management.
Raissa UskenbayevaА.А. КуандыковY.I. ChoZh.B. Kalpeyeva
B. S. MuruganV. VasudevanB. Ganeshpandi