Reinforcement learning for bond portfolio management: an actor-critic approach

Manuel Nunes; Enrico Gerding; Frank McGroarty; Mahesan Niranjan

doi:10.1080/1351847x.2025.2605061

ScienceGate Book Chapters

JOURNAL ARTICLE

Reinforcement learning for bond portfolio management: an actor-critic approach

Manuel Nunes Enrico Gerding Frank McGroarty Mahesan Niranjan

Year: 2026 Journal: European Journal of Finance Pages: 1-28 Publisher: Taylor & Francis

DOI: 10.1080/1351847x.2025.2605061

Get Full-Text PDF Get Analytical Report

Abstract

Portfolio management poses unique challenges for traditional forecasting methods due to its complex, sequential decision-making process. This study leverages reinforcement learning (RL) to address these challenges, focusing on fixed income portfolio management. We develop a novel autonomous RL system using a custom environment for bond exchange-traded fund (ETF) dynamics and the Deep Deterministic Policy Gradient (DDPG) algorithm. Unlike prior studies that merely report algorithmic instability, our work systematically addresses this issue by introducing a robust agent selection process during training. To illustrate the practical benefits, we construct a simple equally weighted ensemble of selected agents that outperforms the static buy-and-hold benchmark by 4.3% and achieves a total return comparable to the portfolio's best-performing asset, while exhibiting superior risk characteristics during periods of market stress. Our methodology also incorporates methodological innovations, including a scaled reward structure to improve learning in bond markets. While instability is observed in the DDPG algorithm, our results demonstrate that this challenge can be systematically mitigated through robust agent selection and ensemble methods. These findings establish RL as a powerful tool for financial strategies where direct forecasting is complex and uncertain, offering a practical framework for implementation in fixed income markets.

Keywords:

Reinforcement learning Portfolio Benchmark (surveying) Construct (python library) Project portfolio management Bond Process (computing) Selection (genetic algorithm)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.27

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Adaptive Dynamic Programming Control

Physical Sciences → Computer Science → Computational Theory and Mathematics

Stock Market Forecasting Methods

Social Sciences → Decision Sciences → Management Science and Operations Research

Risk and Portfolio Optimization

Social Sciences → Decision Sciences → Management Science and Operations Research

Reinforcement learning for bond portfolio management: an actor-critic approach

Abstract

Metrics

Topics

Related Documents

Wireless Parallel Reinforcement Learning: An Actor-Critic Approach

Supervised Actor-Critic Reinforcement Learning

An Asynchronous Advantage Actor-Critic Reinforcement Learning Method for Stock Selection and Portfolio Management

Selector-Actor-Critic and Tuner-Actor-Critic Algorithms for Reinforcement Learning

Multi-actor mechanism for actor-critic reinforcement learning