RailNet A Lightweight Transfer Learning Model for Real-time Rail Component Detection and Defect Segmentation
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Railroad inspections are critical for ensuring operational safety, as track defects such as missing clips and surface damages can lead to catastrophic failures. Traditional inspection methods are often labor-intensive, time-consuming, and prone to inconsistency. Although deep learning approaches have been introduced for track monitoring, they typically focus on a single task, require extensive retraining for multi-tasking, and suffer from deteriorated performance when adapted to new tasks. There is a growing need for lightweight, real-time, and multi-functional solutions that can simultaneously detect rail components and segment rail surface defects without compromising accuracy or speed.
To address these challenges, this paper presents RailNet, a lightweight and modular transfer learning framework tailored for real-time railroad inspection. RailNet integrates a frozen pre-trained component detection model and a new trainable module for surface defect segmentation. The trainable module refines feature maps from the frozen backbone and FPN through targeted correction and attention, introducing three key innovations: a Context Rebalancing Module (CRM) to offset pre-trained biases, a Selective Channel Attention (SCA) mechanism to highlight important channels to minimize computational costs, and a single-step Upsample Block for efficient high-resolution reconstruction. This design enables independent segmentation training without affecting upstream detection, achieving rapid adaptation, high accuracy, and efficient multi-tasking.
RailNet was evaluated on a custom rail defect dataset using only a lightweight trainable component (~5 MB, 0.96 GFLOPs). It achieves 93.20%-pixel accuracy and 92.59% recall for surface defect segmentation while preserving upstream component (e.g., spikes and clips) detection performance (a mAP@0.5 98.7%), as shown in Table 1. Compared to benchmark models like SegFormer, YOLOv12, DINOv2, and MobileSAM, RailNet demonstrates superior accuracy and faster inference speed on edge devices (Nvidia AGX Orin). Ablation studies further confirm the critical roles of the CRM, SCA, and the Upsample Block in enhancing overall performance. As shown in Figure 1, RailNet can simultaneously detect rail components and segment surface damage. These results highlight RailNet’s potential as a robust, real-time, and energy-efficient solution for multi-task railroad inspection and related industrial applications.
How to Cite
##plugins.themes.bootstrap3.article.details##
Deep Learning, Instance Segmentation, Transfer Learning, Lightweight Model, Railroad Inspection
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv.
Du, J., Zhang, R., Gao, R., Nan, L., & Bao, Y. (2024). RSDNet: a new multiscale rail surface defect detection model. Sensors, 24(11), 3579.
Federal Railroad Administration. (2024). Railroad Equipment Accident/Incident Source Data (Form 54).
Ferdousi, R., Laamarti, F., Yang, C., & Saddik, A. El. (2024). A reusable AI-enabled defect detection system for railway using ensembled CNN. Applied Intelligence, 54(20), 9723–9740.
Furlong, T., & Reichard, K. (2023). A Physics-informed, Transfer Learning Approach to Structural Health Monitoring. Annual Conference of the PHM Society, 15(1).
Guo, J., Zhang, S., Qian, Y., & Wang, Y. (2023). A NanoDet Model with Adaptively Weighted Loss for Real-time Railroad Inspection. Annual Conference of the PHM Society, 15(1).
Han, J., & Kwon, D. (2024). Transfer Learning-based Adaptive Diagnosis for Power Plants under Varying Operating Conditions. PHM Society European Conference, 8(1), 6.
Han, Z., Gao, C., Liu, J., Zhang, J., & Zhang, S. Q. (2024). Parameter-efficient fine-tuning for large models: A comprehensive survey. ArXiv Preprint ArXiv:2403.14608.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., & Vasudevan, V. (2019). Searching for mobilenetv3. Proceedings of the IEEE/CVF International Conference on Computer Vision, 1314–1324.
Hurst, A., Lerer, A., Goucher, A. P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A. J., Welihinda, A., Hayes, A., & Radford, A. (2024). Gpt-4o system card. ArXiv Preprint ArXiv:2410.21276.
Li, J., Fu, Y., Yan, D., Ma, S. L., & Sham, C.-W. (2024). An Edge AI System Based on FPGA Platform for Railway Fault Detection. 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE), 1387–1389.
Min, Y., Li, J., & Li, Y. (2023). Rail Surface Defect Detection Based on Improved UPerNet and Connected Component Analysis. Computers, Materials & Continua, 77(1).
Moradi, R., & Groth, K. M. (2020). On the application of transfer learning in prognostics and health management. ArXiv Preprint ArXiv:2007.01965.
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., & El-Nouby, A. (2023). Dinov2: Learning robust visual features without supervision. ArXiv Preprint ArXiv:2304.07193.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J. (2021). Learning transferable visual models from natural language supervision. International Conference on Machine Learning, 8748–8763.
Rasheed, A. F., & Zarkoosh, M. (2024). YOLOv11 Optimization for Efficient Resource Utilization. ArXiv Preprint ArXiv:2412.14790.
Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., & Gustafson, L. (2024). Sam 2: Segment anything in images and videos. ArXiv Preprint ArXiv:2408.00714.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention--MICCAI, 234–241.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ArXiv Preprint ArXiv:1409.1556.
Song, G., Hong, S. H., Kyzer, T., & Wang, Y. (2023). An energy consumption auditing anomaly detection system of robotic manipulators based on a generative adversarial network. Annual Conference of the PHM Society, 15(1).
"TCIS," [Online]. Available: https://www.ensco.com/rail/track-component-imaging-system-tcis. (n.d.).
Tian, Y., Ye, Q., & Doermann, D. (2025). Yolov12: Attention-centric real-time object detectors. ArXiv Preprint ArXiv:2502.12524.
Wang, W., Zheng, V. W., Yu, H., & Miao, C. (2019). A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), 1–37.
Wu, Y., Chen, P., Qin, Y., Qian, Y., Xu, F., & Jia, L. (2023). Automatic railroad track components inspection using hybrid deep learning framework. IEEE Transactions on Instrumentation and Measurement, 72, 1–15.
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34, 12077–12090.
Ye, W., Ren, J., Li, C., Liu, W., Zhang, Z., & Lu, C. (2024). Intelligent Detection of Surface Defects in High‐Speed Railway Ballastless Track Based on Self‐Attention and Transfer Learning. Structural Control and Health Monitoring, 2024(1), 2967927.
Zhang, C., Han, D., Zheng, S., Choi, J., Kim, T.-H., & Hong, C. S. (2023). Mobilesamv2: Faster segment anything to everything. ArXiv Preprint ArXiv:2312.09579.
Zhao, J., Yeung, A. W., Ali, M., Lai, S., & Ng, V. T.-Y. (2024). CBAM-SwinT-BL: Small Rail Surface Defect Detection Method Based on Swin Transformer with Block Level CBAM Enhancement. IEEE Access.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.