Abstract:In order to solve the problem that the sensor is missing or abnormal due to weather or equipment failure in the process of data collection, the accurate traffic change law can not be obtained from the collected data. a traffic flow data missing repair model based on improved nearest neighbor interpolation algorithm and random forest interpolation are proposed respectively. Due to the difference of traffic data missing scene, missing type and spatio-temporal correlation, the data missing type is divided into simple random missing and complex continuous missing, and the improved nearest neighbor interpolation algorithm is used to establish a model to deal with simple random missing. A random forest model is established to iteratively interpolate complex continuous deletions. In the face of two different types of data loss, the expectation maximization algorithm, depth belief network and seasonal differential autoregressive moving average model are used to compare the cross-validation improved nearest neighbor interpolation algorithm and random forest interpolation method. The data comes from the real-time traffic flow data collected by PeMS in California from June 1, 2022 to July 31, 2022 with 5min as the sampling interval. In order to simulate the situation of missing data, the complete data is missing according to a certain proportion to simulate the situation of missing data, and the missing data sets of simple random missing and complex continuous missing distribution are obtained. The results show that this experiment has a good performance under different deletion ratios, and each evaluation index has obvious advantages by designing different deletion ratios and types, which verifies the effectiveness of the two data deletion filling models.