Abstract:Accurate prediction of nearshore tsunami wave heights is essential for the rapid quantitative assessment of disaster risk and is of great significance for improving coastal disaster prevention and mitigation systems and reducing losses caused by extreme marine disasters. Current tsunami warning assessments mainly rely on numerical simulation tools such as the Cornell Multi-grid Coupled Tsunami Model (COMCOT). By solving the shallow water equations, these tools can rapidly evaluate the nearshore impacts of strong tsunamis generated by large earthquakes within the limited warning time available. However, due to the simplified representation of fault slip and rupture processes, as well as the insufficient characterization of bottom friction, seabed roughness, and spatial heterogeneity in nearshore bathymetry, significant biases are often observed in the simulation results. Therefore, a dataset covering 33 major tsunami events worldwide was constructed based on historical tsunami observation data and numerical simulation results. Furthermore, an XGBoost-based prediction model for maximum nearshore tsunami wave heights was developed. In the proposed model, the numerical simulation results were used as the baseline, and multi-source features, including seismic source parameters, the distance between survey points and faults, and geographic information, were integrated. The results showed that the proposed model effectively reduced the bias in the original numerical simulations, with a high correlation coefficient of 0.91 between the predicted and observed nearshore tsunami wave heights. Compared with the original simulation results, the mean absolute error was reduced from 5.67 m to 1.40 m, and the root mean square error was reduced from 8.13 m to 2.45 m. In the independent validation using the 2010 Chile tsunami event, stable generalization capability was demonstrated by the proposed model. The proposed method can significantly improve the accuracy of nearshore tsunami wave height prediction without additional computational cost, thereby providing a promising approach for balancing timeliness and accuracy in global tsunami early warning systems.