An Autonomous Multimodal System for Intelligent Railway Inspection
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
We propose an autonomous aerial inspection system to address growing safety concerns of railway infrastructure degradation. Unlike conventional labor- and sensor-intensive methods, our quadrotor integrates a depth camera, monocular inspection camera, Global Positioning System (GPS) module, and onboard computing unit. Combining visual-inertial fusion with GPS, it achieves robust localization even in GPS-denied environments. A lightweight deep learning model built on You Only Look Once v12 (YOLOv12) enables real-time detection of key components such as spikes and clips. To enhance autonomy, we introduce Railway Autonomous Navigation Guided by Embedded Recognition (RANGER), a novel algorithm that reconstructs 3D world coordinates from 2D detections using only onboard sensing, without requiring prior global maps. By fusing detection with localization data, RANGER enables precise track following and stable altitude control in complex or GPS-denied conditions. This reduces hardware demand while ensuring accurate navigation. Our system reduces operational costs, enhances scalability, and enables accurate, real-time inspections in complex, unstructured environments.
How to Cite
##plugins.themes.bootstrap3.article.details##
Deep Learning, Railroad Inspection, Unmanned Aerial Vehicle
Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M. M., & Tardós, J. D. (2021). Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6), 1874–1890.
Cao, S., Lu, X., & Shen, S. (2022). GVINS: Tightly coupled GNSS–visual–inertial fusion for smooth and consistent state estimation. IEEE Transactions on Robotics, 38(4), 2004–2021.
Donoser, M., & Bischof, H. (2006). Efficient maximally stable extremal region (MSER) tracking. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 1, 553–560.
Engel, J., Koltun, V., & Cremers, D. (2017). Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 611–625.
Federal Railroad Administration. (2024). Railroad Equipment Accident/Incident Source Data (Form 54). https://data.transportation.gov/Railroads/Railroad-Equipment-Accident-Incident-Source-Data-F/aqxq-n5hy/about_data
Frodge, S. L., DeLoach, S. R., Remondi, B., Lapucha, D., & Barker, R. A. (1994). Real‐Time on‐the‐Fly Kinematic GPS System Results. Navigation, 41(2), 175–186.
Guo, J., Zhang, S., Qian, Y., & Wang, Y. (2023). A NanoDet Model with Adaptively Weighted Loss for Real-time Railroad Inspection. Annual Conference of the PHM Society, 15(1).
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, 2961–2969.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
Jocher, G. (2020). Ultralytics YOLOv5. https://doi.org/10.5281/zenodo.3908559
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
Lee, G. Y., Dam, T., Ferdaus, M. M., Poenar, D. P., & Duong, V. N. (2023). Watt-effnet: A lightweight and accurate model for classifying aerial disaster images. IEEE Geoscience and Remote Sensing Letters, 20, 1–5.
Liu, K. (2023). Learning-based defect recognitions for autonomous UAV inspections. ArXiv Preprint ArXiv:2302.06093.
Ngeljaratan, L., Bas, E. E., & Moustafa, M. A. (2024). Unmanned Aerial Vehicle-Based Structural Health Monitoring and Computer Vision-Aided Procedure for Seismic Safety Measures of Linear Infrastructures. Sensors, 24(5), 1450.
Nguyen, T., Shivakumar, S. S., Miller, I. D., Keller, J., Lee, E. S., Zhou, A., Özaslan, T., Loianno, G., Harwood, J. H., & Wozencraft, J. (2019). Mavnet: An effective semantic segmentation micro-network for mav-based tasks. IEEE Robotics and Automation Letters, 4(4), 3908–3915.
Qin, T., Li, P., & Shen, S. (2018). Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4), 1004–1020.
Qiu, L., Zhu, M., Park, J., & Jiang, Y. (2024). Non-Interrupting Rail Track Geometry Measurement System Using UAV and LiDAR. ArXiv Preprint ArXiv:2410.10832.
Romera, E., Alvarez, J. M., Bergasa, L. M., & Arroyo, R. (2017). Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263–272.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ArXiv Preprint ArXiv:1409.1556.
Tian, Y., Ye, Q., & Doermann, D. (2025). Yolov12: Attention-centric real-time object detectors. ArXiv Preprint ArXiv:2502.12524.
Wang, J., & Yu, N. (2022). SSD-faster net: A hybrid network for industrial defect inspection. ArXiv Preprint ArXiv:2207.00589.
Wang, T., Zhang, Z., Yang, F., & Tsui, K.-L. (2021). Automatic rail component detection based on AttnConv-Net. IEEE Sensors Journal, 22(3), 2379–2388.
Weng, Y., Li, Z., Chen, X., He, J., Liu, F., Huang, X., & Yang, H. (2023). A railway track extraction method based on improved DeepLabV3+. Electronics, 12(16), 3500.
Xu, Z., Chen, B., Zhan, X., Xiu, Y., Suzuki, C., & Shimada, K. (2023). A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles. IEEE Robotics and Automation Letters, 8(8), 4983–4990.
Zheng, D., Li, L., Zheng, S., Chai, X., Zhao, S., Tong, Q., Wang, J., & Guo, L. (2021). A defect detection method for rail surface and fasteners based on deep convolutional neural network. Computational Intelligence and Neuroscience, 2021(1), 2565500.
Zhou, X., Wang, Z., Ye, H., Xu, C., & Gao, F. (2020). Ego-planner: An esdf-free gradient-based local planner for quadrotors. IEEE Robotics and Automation Letters, 6(2), 478–485.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.