Abstract:To address the challenges of insufficient accuracy and error control in predicting departure taxi-out time, a DDA-SIM-ATT-CatBoost prediction model integrating dynamic data augmentation, similarity theory, and an attention mechanism is proposed. First, based on historical operational data, three categories of features including the surface traffic flow, flight attributes, and operational environment are systematically analyzed, and key influencing factors are selected through correlation analysis. Subsequently, dynamic data augmentation technology is employed to expand the training sample distribution, similarity theory is applied to achieve multi-dimensional feature space matching of samples, and an attention mechanism is introduced to adaptively calibrate feature weights. Finally, the CatBoost algorithm is utilized for regression prediction, leveraging its advantages in handling categorical features and complex nonlinear relationships. Comparative and ablation experiments conducted on actual operational data from a domestic hub airport show that the proposed model achieves prediction accuracies of 74.57%, 89.12%, and 97.76% within error margins of ±120s, ±180s, and ±300s, respectively, with MAPE, MAE, and RMSE values of 10.34%, 87.55s, and 125.61s. The model significantly outperforms comparative models in performance. Each improved module contributes positively to enhancing model performance, and the integrated DDA-SIM-ATT-CatBoost model, through synergistic optimization, achieves optimal performance in both prediction accuracy and stability, providing reliable decision support for airport ground operational scheduling.