Abstract:To address the challenges in detecting surface defects in subway tunnels, such as blurred boundaries, multi-scale variations, slender shapes, and class imbalance, an improved SegFormer-ESF model is proposed. First, a parallel edge information enhancement branch is designed. Fixed Sobel operators and depthwise separable convolutions are used to extract and enhance gradient features. These edge features were adaptively fused with backbone features through a feature fusion module, which improved perception and localization of defect boundaries. Second, a directional strip convolutional attention module was embedded in the encoder. It employed 1×K and K×1 convolutions to model geometric structure and directional context of linear defects, such as cracks and leakage, thereby enhancing representation of slender objects. Then, a lightweight multi-scale pyramid feature fusion module was introduced to reconstruct the decoder, improving multi-scale feature integration while reducing model complexity. Finally, to mitigate class imbalance caused by low pixel proportion of defects, a combined Focal-Dice loss function was proposed, which jointly optimized hard example mining and overall region recognition. Experimental results show that SegFormer-ESF achieves 83.53% mIoU, 92.46% mPA, 97.61% accuracy, which are improvements of 3.72%, 3.51%, and 0.46% over SegFormer-B0, respectively. Moreover, the model has only 3.620M parameters and 3.027G FLOPs, reductions of 2.56% and 55.46% compared to the baseline, achieving lightweight design and segmentation accuracy. At 88.53 FPS, it meets real-time detection requirements and offers a practical solution for intelligent defect identification and deployment on mobile platforms.