Abstract:To address the challenges faced by unmanned aerial vehicles (UAVs) in autonomous navigation within unknown dynamic environments, including partial observability, high training difficulty, and slow convergence. A novel deep reinforcement learning navigation method based on Priority Replay-Gated-Flexible Actor-Critic (PG-SAC) is proposed. First, gated recurrent units (GRUs) are introduced in both the policy network and the value network to effectively integrate current states and historical observations, tackling the partial observability issue and enhancing the UAV’s decision-making ability in complex environments. Second, a priority experience replay (PER) mechanism is incorporated, which assigns higher sampling priority to experience samples with high temporal difference (TD) errors, enabling the UAV to focus more on learning critical experiences, thereby improving learning efficiency and convergence speed. Third, a non-sparse reward function is designed to address the challenges of training reinforcement learning agents under sparse rewards, reducing exploration issues. Finally, the algorithm is trained in a three-dimensional simulation environment built on the Unreal Engine platform. Experimental results show that compared to mainstream deep reinforcement learning algorithms, the PG-SAC algorithm achieves faster convergence and higher reward values. Additionally, compared to the previous version of the algorithm, the navigation success rate is improved by 18.75%, and the average flight time is reduced by 19.33%.