基于随机森林和最近邻插值法的交通流量数据修复方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

U491.5

基金项目:

陕西重点研发计划(2022GY-335)


Traffic Flow Data Repair Method Based on Random Forest and Nearest Neighbor Interpolation
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对目前传感器在采集数据过程中由于受到天气或者自身设备故障等原因,造成数据缺失或者数据异常,导致不能从采集的数据中获得准确的交通变化规律等问题,分别提出基于改进最近邻插值算法和基于随机森林插补的交通流量数据缺失修复模型。由于交通数据缺失场景和缺失类型以及时空关联的差异性,将数据缺失类型划分为简单随机缺失和复杂连续缺失两种;利用改进的最近邻插值算法建立模型处理简单随机缺失,建立随机森林模型进行迭代插补处理复杂连续缺失;面对两种不同的数据缺失类型,利用期望最大化算法、深度信念网络、季节性差分自回归滑动平均模型分别搭建模型对比交叉验证改进的最近邻插值算法和随机森林插补方法。数据来源于美国加利福尼亚州PeMS实时采集的从2022年6月1日到2022年7月31日以5min为采样时间间隔的交通流量数据,为了模拟数据的缺失状况,将完整数据按照一定比例进行缺失,来模拟数据缺数的情况,得到简单随机缺失和复杂连续缺失分布的交通流量缺失数据集。结果表明:本实验在不同的缺失比例下均有良好的表现,通过设计不同的缺失比例和类型,各项评估指标均有明显优势,验证了两种数据缺失填充模型的有效性。

    Abstract:

    In order to solve the problem that the sensor is missing or abnormal due to weather or equipment failure in the process of data collection, the accurate traffic change law can not be obtained from the collected data. a traffic flow data missing repair model based on improved nearest neighbor interpolation algorithm and random forest interpolation are proposed respectively. Due to the difference of traffic data missing scene, missing type and spatio-temporal correlation, the data missing type is divided into simple random missing and complex continuous missing, and the improved nearest neighbor interpolation algorithm is used to establish a model to deal with simple random missing. A random forest model is established to iteratively interpolate complex continuous deletions. In the face of two different types of data loss, the expectation maximization algorithm, depth belief network and seasonal differential autoregressive moving average model are used to compare the cross-validation improved nearest neighbor interpolation algorithm and random forest interpolation method. The data comes from the real-time traffic flow data collected by PeMS in California from June 1, 2022 to July 31, 2022 with 5min as the sampling interval. In order to simulate the situation of missing data, the complete data is missing according to a certain proportion to simulate the situation of missing data, and the missing data sets of simple random missing and complex continuous missing distribution are obtained. The results show that this experiment has a good performance under different deletion ratios, and each evaluation index has obvious advantages by designing different deletion ratios and types, which verifies the effectiveness of the two data deletion filling models.

    参考文献
    相似文献
    引证文献
引用本文

汤伟,漆苏应,杨晓东,等. 基于随机森林和最近邻插值法的交通流量数据修复方法[J]. 科学技术与工程, 2024, 24(32): 14056-14065.
Tang Wei, Qi Suying, Yang Xiaodong, et al. Traffic Flow Data Repair Method Based on Random Forest and Nearest Neighbor Interpolation[J]. Science Technology and Engineering,2024,24(32):14056-14065.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-09-25
  • 最后修改日期:2024-09-06
  • 录用日期:2024-05-21
  • 在线发布日期: 2024-11-28
  • 出版日期: