基于PG-SAC的无人机自主导航避障方法
DOI:
作者:
作者单位:

1.新疆大学智能科学与技术学院;2.清华大学航空发动机研究院

作者简介:

通讯作者:

中图分类号:

V279;V249.3

基金项目:

国家自然科学基金(2024TSYCCX0023);新疆维吾尔自治区天山英才青年拔尖人才项目(62263030)


UAV Autonomous Navigation and Obstacle Avoidance Method Based on PG-SAC
Author:
Affiliation:

School of Intelligent Science and Technology,Xinjiang University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对无人机在未知动态环境中的自主导航所面临的部分不可观测性、训练难度大以及收敛速度慢等挑战,提出一种基于优先回放-门控-柔性演员-评论家(PG-SAC)的深度强化学习导航方法。首先,在策略网络和价值网络中引入门控循环单元,有效整合当前状态和历史观测信息以解决部分不可观测问题,提高无人机在复杂环境下的决策能力;其次,引入优先经验回放机制,赋予高时序差分误差的经验样本更高的采样优先级,使无人机更专注于学习关键经验,提升学习效率和收敛速度;然后,设计非稀疏奖励函数,解决强化学习策略在稀疏奖励条件下训练困难、探索不足的问题。最后,基于虚幻引擎仿真平台构建的三维仿真环境中进行训练,实验结果表明,与主流的深度强化学习算法相比,PG-SAC算法具有更快的收敛速度和更高的奖励值,相较于改进前算法导航成功率提高18.75%,平均飞行时间减少19.33%。

    Abstract:

    To address the challenges faced by unmanned aerial vehicles (UAVs) in autonomous navigation within unknown dynamic environments, including partial observability, high training difficulty, and slow convergence. A novel deep reinforcement learning navigation method based on Priority Replay-Gated-Flexible Actor-Critic (PG-SAC) is proposed. First, gated recurrent units (GRUs) are introduced in both the policy network and the value network to effectively integrate current states and historical observations, tackling the partial observability issue and enhancing the UAV’s decision-making ability in complex environments. Second, a priority experience replay (PER) mechanism is incorporated, which assigns higher sampling priority to experience samples with high temporal difference (TD) errors, enabling the UAV to focus more on learning critical experiences, thereby improving learning efficiency and convergence speed. Third, a non-sparse reward function is designed to address the challenges of training reinforcement learning agents under sparse rewards, reducing exploration issues. Finally, the algorithm is trained in a three-dimensional simulation environment built on the Unreal Engine platform. Experimental results show that compared to mainstream deep reinforcement learning algorithms, the PG-SAC algorithm achieves faster convergence and higher reward values. Additionally, compared to the previous version of the algorithm, the navigation success rate is improved by 18.75%, and the average flight time is reduced by 19.33%.

    参考文献
    相似文献
    引证文献
引用本文

孙铭声,李新凯,孟月,等. 基于PG-SAC的无人机自主导航避障方法[J]. 科学技术与工程, , ():

复制
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-12-22
  • 最后修改日期:2026-05-06
  • 录用日期:2026-05-15
  • 在线发布日期:
  • 出版日期:
×
2026年会通知 | “技术经济学驱动智能经济生态构建与治理变革”——中国技术经济学会第三十三届学术年会(2026)会议通知暨征文启事(第一轮)
亟待确认版面费归属稿件,敬请作者关注