Abstract:The railway overhead catenary, as a key component of the traction power supply system, is essential to ensuring the safe operation of trains. While deep learning-based intrusion detection for railway catenaries has matured, existing approaches encounter two primary bottlenecks in complex scenarios: (1) high susceptibility to drastic lighting changes and cluttered backgrounds, and (2) inadequate feature representation for accurately identifying small-scale anomalies. To address the challenges of weak texture in small objects, strong background structural interference, and large-scale variations in railway catenary foreign object detection, this paper presents MFB-YOLO11, a catenary foreign object detection method based on a hierarchical collaborative optimization framework that integrates low-level detail enhancement, mid-level context aggregation, and high-level semantic interaction. An MLLA Block, integrating linear attention and local-state modeling, is introduced into the backbone and neck to enhance the fine-grained representation of small objects. A multi-scale contextual enhancement structure combined with Focal Modulation is designed at the SPPF stage to improve cross-scale feature aggregation and discrimination in complex backgrounds. In the high-level feature modeling stage, BiFormer is incorporated to strengthen selective attention to key regions and global semantic interaction. Experiments conducted on a public dataset report an overall accuracy of 92.6%, together with mAP50 of 94.4% and mAP50-95 of 86.2%, demonstrating superiority over mainstream detectors. On a self-collected real-world dataset, mAP50 of 81.2% and mAP50-95 of 61.4% are achieved, indicating stable generalization to practical scenarios. Overall, the results suggest that multi-scale context enhancement and attention-driven fusion can effectively improve robustness to background clutter and scale variation, supporting potential deployment in catenary inspection systems for preventive maintenance and safety assurance.