Abstract:Aiming at the classification challenge of rock thin section images caused by large intra-class differences and high inter-class similarity, this paper proposes a dual-path classification model named WD-DDPM, which integrates local frequency-domain features and global semantics. Based on ResNet-50, our method first designs a Wavelet Transform based Multi-scale Downsampling Block (WTMDB) to replace the traditional downsampling operation. By performing wavelet decomposition on the image, it retains low-frequency contour information while simultaneously using an attention mechanism to enhance edges and textural details in high-frequency subbands, effectively improving the model's ability to extract local discriminative features. Secondly, the U-Net encoder of a diffusion model is introduced as a parallel path to extract global contextual features that encompass the overall structure and long-range dependencies. Finally, the features from both paths, representing local details and global semantics, are fused to form a complementary image representation, which is then fed into a classifier for lithology identification. Experiments on a rock thin section dataset containing 108 sub-categories show that our method achieves accuracy, recall, and F1-score of 97.5%, 94.7%, and 96%, respectively, significantly outperforming mainstream convolutional models such as VGG16 and MobileNetV3, verifying its effectiveness and superiority in complex rock image classification.