Hierarchical Control for USV Trajectory Tracking with Proactive–Reactive Reward Shaping

Zixiao Luo; Dongmei Du; Dandan Liu; Qiangqiang Yang; Yi Chai; Shiyu Hu; Jiayou Wu

doi:10.3390/jmse13122392

ScienceGate Book Chapters

JOURNAL ARTICLE

Hierarchical Control for USV Trajectory Tracking with Proactive–Reactive Reward Shaping

Zixiao Luo Dongmei Du Dandan Liu Qiangqiang Yang Yi Chai Shiyu Hu Jiayou Wu

Year: 2025 Journal: Journal of Marine Science and Engineering Vol: 13 (12)Pages: 2392-2392 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/jmse13122392

Get Full-Text PDF Get Analytical Report

Abstract

To address trajectory tracking of underactuated unmanned surface vessels (USVs) under disturbances and model uncertainty, we propose a hierarchical control framework that combines model predictive control (MPC) with proximal policy optimization (PPO). The outer loop runs in the inertial reference frame, where an MPC planner based on a kinematic model enforces velocity and safety constraints and generates feasible body–fixed velocity references. The inner loop runs in the body–fixed reference frame, where a PPO policy learns the nonlinear inverse mapping from velocity to multi–thruster thrust, compensating hydrodynamic modeling errors and external disturbances. On top of this framework, we design a Proactive–Reactive Adaptive Reward (PRAR) that uses the MPC prediction sequence and real–time pose errors to adaptively reweight the reward across surge, sway and yaw, improving robustness and cross–model generalization. Simulation studies on circular and curvilinear trajectories compare the proposed PRAR–driven dual–loop controller (PRAR–DLC) with MPC–PID, PPO–Only, MPC–PPO and PPO variants. On the curvilinear trajectory, PRAR–DLC reduces surge MAE and maximum tracking error from 0.269 m and 0.963 m (MPC–PID) to 0.138 m and 0.337 m, respectively; on the circular trajectory it achieves about an 8.5% reduction in surge MAE while maintaining comparable sway and yaw accuracy to the baseline controllers. Real–time profiling further shows that the average MPC and PPO evaluation times remain below the control sampling period, indicating that the proposed architecture is compatible with real–time onboard implementation and physical deployment.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Hierarchical Control for USV Trajectory Tracking with Proactive–Reactive Reward Shaping

Abstract

Metrics

Topics

Related Documents

Reward Shaping with Dynamic Trajectory Aggregation

Reward shaping with hierarchical graph topology

Subliminal Reward Modulates the Tradeoff between Proactive and Reactive Cognitive Control

Hierarchical Tracking Control With Arbitrary Task Dimensions: Application to Trajectory Tracking on Submanifolds

Robust Hierarchical Model Predictive Control for Trajectory Tracking with Obstacle Avoidance