Active Sim-to-Real Gap Reduction for Industrial Inspection via Digital Twin and Embedding Analysis
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Simulation-based training is increasingly used in automated industrial inspection, where collecting and annotating real-world inspection data is costly and often impractical. While synthetic data generated from digital twins enables scalable training, models trained solely in simulation suffer from a significant sim-to-real gap under real inspection conditions such as varying lighting, surface properties, and sensor noise. In this work, we propose a data-efficient sim-to-real adaptation framework that combines representative sample selection via k-determinantal point processes (k-DPP) with embedding-level alignment using Kullback–Leibler (KL) divergence. The key idea is to actively identify a small set of representative synthetic samples, acquire the corresponding real images, and align their latent feature representations while retaining the coverage provided by the larger synthetic dataset. We first train an RF-DETR(Detection Transformer) detector on 550 synthetic inspection images, achieving near-perfect performance in simulation but only 0.2516 mean Average Precision (mAP) on real-world images. Using only 50 paired real images (approximately 10% of the synthetic training set) together with 500 unpaired synthetic images, the proposed method increases real-world mAP from 0.2516 to 0.8853. The k-DPP sampling strategy maximizes the diversity of selected samples, reducing the risk of bias introduced by limited real-world data, while KL-based embedding alignment further reduces domain discrepancy between synthetic and real images. The proposed framework provides a lightweight and practical approach for reducing sim-to-real gaps in a representative industrial inspection setting where real data collection is limited.
How to Cite
##plugins.themes.bootstrap3.article.details##
Sim-to-Real Transfer, Digital Twin, Feature Alignment, Active Data Acquisition
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In *European Conference on Computer Vision (ECCV)*.
Chou, P.-H., Wang, C.-C., & Mao, W.-L. (2025). YOLO-based defect detection for metal sheets. *arXiv preprint arXiv:2509.25659*.
Ganin, Y., & Lempitsky, V. (2016). Domain-adversarial training of neural networks. *Journal of Machine Learning Research*, 17(59), 1–35.
Gao, D., Wang, Q., Yang, J., & Wu, J. (2025). Domain adaptive object detection via synthetically generated intermediate domain and progressive feature alignment. *Image and Vision Computing*, 154, 105404.
Jocher, G., & Qiu, J. (2026). Ultralytics YOLO26. Retrieved from https://github.com/ultralytics/ultralytics
Kulesza, A., & Taskar, B. (2012). Determinantal point processes for machine learning. *Foundations and Trends in Machine Learning*, 5(2–3), 123–286.
Moore, B. E., & Corso, J. J. (2020). FiftyOne. GitHub. https://github.com/voxel51/fiftyone
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*.
Robinson, I., Robicheaux, P., Popov, M., Ramanan, D., & Peri, N. (2025). RF-DETR: Neural architecture search for real-time detection transformers. *arXiv preprint arXiv:2511.09554*.
Ruter, J., Durak, U., & Dauer, J. (2024). Investigating the sim-to-real generalizability of deep learning object detection models. *Journal of Imaging*.
Sener, O., & Savarese, S. (2018). Active learning for convolutional neural networks: A core-set approach. In *International Conference on Learning Representations (ICLR)*.
Settles, B. (2009). Active learning literature survey. University of Wisconsin-Madison.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. *IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)*.
Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In *CVPR*.
Wu, Y., Chen, J., Yu, X., & Li, J. (2026). YOLO-FOA: A lightweight rotational target detection algorithm based on improved YOLO for optical fiber robot. *Biomimetic Intelligence and Robotics*, 100273.
Wu, Y., Guo, W., Tan, Z., et al. (2024). Syn2real detection in the sky: Generation and adaptation of synthetic aerial ship images. *Applied Sciences*.
Zhao, H., Guo, J., Dong, E., Guo, R., Zhao, L., Wang, C., . . . Li, Y. (2026). YOLO-GDCNN: Real-time operating point detection for live working robots in the power industry. *High Voltage*.
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2021). Deformable DETR: Deformable transformers for end-to-end object detection. In *ICLR*.
Zuo, Z., Dong, J., Gao, Y., & Wu, Z. (2024). HyperDefect-YOLO: Enhance YOLO with hypergraph computation for industrial defect detection. *arXiv preprint arXiv:2412.03969*.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.