An Autonomous Multimodal System for Intelligent Railway Inspection

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Oct 26, 2025
Boshi Chen Jiawei Guo Qian Zhang Yi Wang

Abstract

We propose an autonomous aerial inspection system to address growing safety concerns of railway infrastructure degradation. Unlike conventional labor- and sensor-intensive methods, our quadrotor integrates a depth camera, monocular inspection camera, Global Positioning System (GPS) module, and onboard computing unit. Combining visual-inertial fusion with GPS, it achieves robust localization even in GPS-denied environments. A lightweight deep learning model built on You Only Look Once v12 (YOLOv12) enables real-time detection of key components such as spikes and clips. To enhance autonomy, we introduce Railway Autonomous Navigation Guided by Embedded Recognition (RANGER), a novel algorithm that reconstructs 3D world coordinates from 2D detections using only onboard sensing, without requiring prior global maps. By fusing detection with localization data, RANGER enables precise track following and stable altitude control in complex or GPS-denied conditions. This reduces hardware demand while ensuring accurate navigation. Our system reduces operational costs, enhances scalability, and enables accurate, real-time inspections in complex, unstructured environments.

How to Cite

Chen, B., Guo, J., Zhang, Q., & Wang, Y. (2025). An Autonomous Multimodal System for Intelligent Railway Inspection. Annual Conference of the PHM Society, 17(1). https://doi.org/10.36001/phmconf.2025.v17i1.4319
Abstract 1 | PDF Downloads 0

##plugins.themes.bootstrap3.article.details##

Keywords

Deep Learning, Railroad Inspection, Unmanned Aerial Vehicle

References
Association of American Railroads. (2024). Resources. Https://Www.Aar.Org/Resources/. https://www.aar.org/Resources/
Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M. M., & Tardós, J. D. (2021). Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6), 1874–1890.
Cao, S., Lu, X., & Shen, S. (2022). GVINS: Tightly coupled GNSS–visual–inertial fusion for smooth and consistent state estimation. IEEE Transactions on Robotics, 38(4), 2004–2021.
Donoser, M., & Bischof, H. (2006). Efficient maximally stable extremal region (MSER) tracking. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 1, 553–560.
Engel, J., Koltun, V., & Cremers, D. (2017). Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 611–625.
Federal Railroad Administration. (2024). Railroad Equipment Accident/Incident Source Data (Form 54). https://data.transportation.gov/Railroads/Railroad-Equipment-Accident-Incident-Source-Data-F/aqxq-n5hy/about_data
Frodge, S. L., DeLoach, S. R., Remondi, B., Lapucha, D., & Barker, R. A. (1994). Real‐Time on‐the‐Fly Kinematic GPS System Results. Navigation, 41(2), 175–186.
Guo, J., Zhang, S., Qian, Y., & Wang, Y. (2023). A NanoDet Model with Adaptively Weighted Loss for Real-time Railroad Inspection. Annual Conference of the PHM Society, 15(1).
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, 2961–2969.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
Jocher, G. (2020). Ultralytics YOLOv5. https://doi.org/10.5281/zenodo.3908559
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
Lee, G. Y., Dam, T., Ferdaus, M. M., Poenar, D. P., & Duong, V. N. (2023). Watt-effnet: A lightweight and accurate model for classifying aerial disaster images. IEEE Geoscience and Remote Sensing Letters, 20, 1–5.
Liu, K. (2023). Learning-based defect recognitions for autonomous UAV inspections. ArXiv Preprint ArXiv:2302.06093.
Ngeljaratan, L., Bas, E. E., & Moustafa, M. A. (2024). Unmanned Aerial Vehicle-Based Structural Health Monitoring and Computer Vision-Aided Procedure for Seismic Safety Measures of Linear Infrastructures. Sensors, 24(5), 1450.
Nguyen, T., Shivakumar, S. S., Miller, I. D., Keller, J., Lee, E. S., Zhou, A., Özaslan, T., Loianno, G., Harwood, J. H., & Wozencraft, J. (2019). Mavnet: An effective semantic segmentation micro-network for mav-based tasks. IEEE Robotics and Automation Letters, 4(4), 3908–3915.
Qin, T., Li, P., & Shen, S. (2018). Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4), 1004–1020.
Qiu, L., Zhu, M., Park, J., & Jiang, Y. (2024). Non-Interrupting Rail Track Geometry Measurement System Using UAV and LiDAR. ArXiv Preprint ArXiv:2410.10832.
Romera, E., Alvarez, J. M., Bergasa, L. M., & Arroyo, R. (2017). Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263–272.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ArXiv Preprint ArXiv:1409.1556.
Tian, Y., Ye, Q., & Doermann, D. (2025). Yolov12: Attention-centric real-time object detectors. ArXiv Preprint ArXiv:2502.12524.
Wang, J., & Yu, N. (2022). SSD-faster net: A hybrid network for industrial defect inspection. ArXiv Preprint ArXiv:2207.00589.
Wang, T., Zhang, Z., Yang, F., & Tsui, K.-L. (2021). Automatic rail component detection based on AttnConv-Net. IEEE Sensors Journal, 22(3), 2379–2388.
Weng, Y., Li, Z., Chen, X., He, J., Liu, F., Huang, X., & Yang, H. (2023). A railway track extraction method based on improved DeepLabV3+. Electronics, 12(16), 3500.
Xu, Z., Chen, B., Zhan, X., Xiu, Y., Suzuki, C., & Shimada, K. (2023). A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles. IEEE Robotics and Automation Letters, 8(8), 4983–4990.
Zheng, D., Li, L., Zheng, S., Chai, X., Zhao, S., Tong, Q., Wang, J., & Guo, L. (2021). A defect detection method for rail surface and fasteners based on deep convolutional neural network. Computational Intelligence and Neuroscience, 2021(1), 2565500.
Zhou, X., Wang, Z., Ye, H., Xu, C., & Gao, F. (2020). Ego-planner: An esdf-free gradient-based local planner for quadrotors. IEEE Robotics and Automation Letters, 6(2), 478–485.
Section
Poster Presentations