Learning the Language of Vibration: A Self-Supervised Transformer Foundation Model for PHM
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Industrial prognostics and health management (PHM) increasingly relies on vibration-based deep learning, yet field deployment remains limited by two practical constraints: (i) fault labels are scarce and expensive, and (ii) models trained on one machine or dataset often degrade under distribution (domain) shift (different sensors, sampling rates, loads, and signal conventions). These constraints motivate vibration foundation models: reusable encoders trained once on large collections of unlabeled raw vibration recordings and adapted to new assets with minimal supervision. This paper presents VibFM, a Transformer encoder trained via self-supervised masked spectrogram modeling in the spirit of masked language modeling and masked autoencoders. Raw waveforms from 16 open datasets totaling ≈ 400 hours are standardized into 128 × 128 log-magnitude short-time Fourier transform (STFT) spectrograms and paired with a compact conditioning vector that encodes sampling rate and time/frequency resolution. Pre-training reconstructs masked time–frequency patches, encouraging the encoder to capture transferable vibration primitives such as persistent narrowband ridges, modulation sidebands, and impulsive transients. Transfer is evaluated on the held-out Paderborn University and KAt DataCenter bearing benchmark (excluded from pre-training) using leakage-resistant bearing-level splits. On three-class fault diagnosis, frozen VibFM features substantially improve over training from scratch, while end-to-end fine-tuning provides the strongest performance. For reconstruction-based anomaly detection, adapting a decoder on healthy target data yields reconstruction-error scores that separate healthy from damaged states across operating conditions. Masked reconstructions and pooling-attention visualizations provide qualitative audits of learned time–frequency structure, and the limits of these interpretability probes are discussed.
How to Cite
##plugins.themes.bootstrap3.article.details##
Foundation Model, Vision Transformer, Masked Spectrogram Modeling, Transfer Learning
Chen, S., Liu, Z., He, X., Zou, D., & Zhou, D. (2024). Multimode fault diagnosis datasets of gearbox under variable working conditions. Data in Brief, 54, 110453. doi: 10.1016/j.dib.2024.110453
Chen, S., Wu, Y., Wang, C., Liu, S., Tompkins, D., Chen, Z., ... Wei, F. (2023). BEATs: Audio pre-training with acoustic tokenizers. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 5178–5193).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 4171–4186). doi: 10.18653/v1/N19-1423
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
Gong, Y., Chung, Y.-A., & Glass, J. (2021). AST: Audio spectrogram transformer. In Proceedings of Interspeech (pp. 571–575). doi: 10.21437/Interspeech.2021-698
Gong, Y., Lai, C.-I., Chung, Y.-A., & Glass, J. (2022). SSAST: Self-supervised audio spectrogram transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10699–10709. doi: 10.1609/aaai.v36i10.21315
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 16000–16009). doi: 10.1109/CVPR52688.2022.01553
Hendriks, J., Dumond, P., & Knox, D. A. (2022). Towards better benchmarking using the CWRU bearing fault dataset. Mechanical Systems and Signal Processing, 169, 108732. doi: 10.1016/j.ymssp.2021.108732
Huang, H., & Baddour, N. (2019). Bearing vibration data under time-varying rotational speed conditions. doi: 10.17632/v43hmbwxpm.2
Huang, P.-Y., Xu, H., Li, J., Baevski, A., Auli, M., Galuba, W., ... Feichtenhofer, C. (2022). Masked autoencoders that listen. In Advances in Neural Information Processing Systems.
Jain, S., & Wallace, B. C. (2019). Attention is not explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 3543–3556). doi: 10.18653/v1/N19-1357
Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical Systems and Signal Processing, 20(7), 1483–1510. doi: 10.1016/j.ymssp.2005.09.012
Lai, Z., Yang, C., Lan, S., Wang, L., Shen, W., & Zhu, L. (2024). BearingFM: Towards a foundation model for bearing fault diagnosis by domain knowledge and contrastive learning. International Journal of Production Economics, 275, 109319. doi: 10.1016/j.ijpe.2024.109319
Lee, S., Kim, T., & Kim, T. (2024). Multi-domain vibration dataset with various bearing types under compound machine fault scenarios: Subset 1 — Deep groove ball bearing. doi: 10.17632/53vtnjy6c6.1
Lei, Y., Han, T., Wang, B., Li, N., Yan, T., & Yang, J. (2019). XJTU-SY rolling element bearing accelerated life test datasets: A tutorial. Journal of Mechanical Engineering. doi: 10.3901/JME.2019.16.001
Lessmeier, C., Kimotho, J. K., Zimmer, D., & Sextro, W. (2016). Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. PHM Society European Conference, 3(1). doi: 10.36001/phme.2016.v3i1.1577
Li, Y.-F., Wang, H., & Sun, M. (2024). ChatGPT-like large-scale foundation models for prognostics and health management: A survey and roadmaps. Reliability Engineering & System Safety, 243, 109850. doi: 10.1016/j.ress.2023.109850
Liefstingh, M., Taal, C., Echeverri Restrepo, S., & Azarfar, A. (2021). Interpretation of deep learning models in bearing fault diagnosis. In Annual Conference of the PHM Society (Vol. 13). doi: 10.36001/phmconf.2021.v13i1.3047
Luleå University of Technology. (2024). Vibration data from a gearbox output shaft bearing in a 2.5 MW wind turbine. Retrieved from https://researchdata.se/en/catalogue/dataset/2024-248
Lundström, A., & O’Nils, M. (2023). Factory-based vibration data for bearing-fault detection. Data, 8(7), 115. doi: 10.3390/data8070115
NASA Prognostics Data Repository, & IMS Center, University of Cincinnati. (2007). IMS bearings. Retrieved from https://catalog.data.gov/dataset/ims-bearings
National Renewable Energy Laboratory. (2014). Wind turbine gearbox condition monitoring: Vibration analysis benchmarking datasets. doi: 10.25984/1844194
Nectoux, P., Gouriveau, R., Medjaher, K., Ramasso, E., Chebel-Morello, B., Zerhouni, N., & Varnier, C. (2012). PRONOSTIA: An experimental platform for bearings accelerated degradation tests. In Proceedings of the IEEE International Conference on Prognostics and Health Management (PHM) (pp. 1–8).
Nguyen, T. (2023). HUST bearing. doi: 10.17632/cbv7jyx4p9.1
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. doi: 10.1109/TKDE.2009.191
PHM Society, & NASA DASHlink. (2009). Gearbox fault detection dataset, PHM Data Challenge 2009. Retrieved from https://phmsociety.org/public-data-sets/
Randall, R. B. (2011). Vibration-based condition monitoring: Industrial, aerospace and automotive applications. John Wiley & Sons.
Schnur, C., Schneider, M., Zhang, Y., Berger, K., Schütze, S., Zou, J., ... Heimes, H. (2025). A machine learning dataset of artificial inner ring damages on cylindrical roller bearings measured under varying cross-influences. doi: 10.5281/zenodo.11108503
Sehri, M., & Dumond, P. (2023). University of Ottawa ball-bearing vibration and acoustic fault data under constant load and speed conditions (UODS-VAFDC). doi: 10.17632/y2px5tg92h.1
Serrano, S., & Smith, N. A. (2019). Is attention interpretable? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 2931–2941). doi: 10.18653/v1/P19-1282
Southeast University. (2024). SEU gearbox dataset. doi: 10.57702/wwr1jf6s
Tandon, N., & Choudhury, A. (1999). A review of vibration and acoustic measurement methods for the detection of defects in rolling element bearings. Tribology International, 32(8), 469–480. doi: 10.1016/S0301-679X(99)00077-8
University of New South Wales. (2020). Bearing run-to-failure datasets of UNSW. Mendeley Data. doi: 10.17632/h4df4mgrfb.3
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605.
Wang, H., Liu, Z., Ge, Y., & Peng, D. (2022). Self-supervised signal representation learning for machinery fault diagnosis under limited annotation data. Knowledge-Based Systems, 239, 107978. doi: 10.1016/j.knosys.2021.107978
Xiao, Y., Shao, H., Yan, S., Wang, J., Peng, Y., & Liu, B. (2025). Domain generalization for rotating machinery fault diagnosis: A survey. Advanced Engineering Informatics, 64, 103063. doi: 10.1016/j.aei.2024.103063
Zhao, C., Zio, E., & Shen, W. (2024). Domain generalization for cross-domain fault diagnosis: An application-oriented perspective and a benchmark study. Reliability Engineering & System Safety, 245, 109964. doi: 10.1016/j.ress.2024.109964
Zhao, R., Yan, R., Chen, Z., Mao, K., Liu, P., & Gao, R. X. (2019). Deep learning and its applications to machine health monitoring. Mechanical Systems and Signal Processing, 115, 213–237. doi: 10.1016/j.ymssp.2018.05.050

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.