Learning the Language of Vibration: A Self-Supervised Transformer Foundation Model for PHM

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jul 3, 2026
Giuseppe Mannone Paula Fischer Martin Dazer

Abstract

Industrial prognostics and health management (PHM) increasingly relies on vibration-based deep learning, yet field deployment remains limited by two practical constraints: (i) fault labels are scarce and expensive, and (ii) models trained on one machine or dataset often degrade under distribution (domain) shift (different sensors, sampling rates, loads, and signal conventions). These constraints motivate vibration foundation models: reusable encoders trained once on large collections of unlabeled raw vibration recordings and adapted to new assets with minimal supervision. This paper presents VibFM, a Transformer encoder trained via self-supervised masked spectrogram modeling in the spirit of masked language modeling and masked autoencoders. Raw waveforms from 16 open datasets totaling ≈ 400 hours are standardized into 128 × 128 log-magnitude short-time Fourier transform (STFT) spectrograms and paired with a compact conditioning vector that encodes sampling rate and time/frequency resolution. Pre-training reconstructs masked time–frequency patches, encouraging the encoder to capture transferable vibration primitives such as persistent narrowband ridges, modulation sidebands, and impulsive transients. Transfer is evaluated on the held-out Paderborn University and KAt DataCenter bearing benchmark (excluded from pre-training) using leakage-resistant bearing-level splits. On three-class fault diagnosis, frozen VibFM features substantially improve over training from scratch, while end-to-end fine-tuning provides the strongest performance. For reconstruction-based anomaly detection, adapting a decoder on healthy target data yields reconstruction-error scores that separate healthy from damaged states across operating conditions. Masked reconstructions and pooling-attention visualizations provide qualitative audits of learned time–frequency structure, and the limits of these interpretability probes are discussed.

How to Cite

Mannone, G., Fischer, P., & Dazer, M. (2026). Learning the Language of Vibration: A Self-Supervised Transformer Foundation Model for PHM. PHM Society European Conference, 9(1), 1–15. https://doi.org/10.36001/phme.2026.v9i1.4912
Abstract 0 | PDF Downloads 0

##plugins.themes.bootstrap3.article.details##

Keywords

Foundation Model, Vision Transformer, Masked Spectrogram Modeling, Transfer Learning

References
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15:1–15:58. doi: 10.1145/1541880.1541882

Chen, S., Liu, Z., He, X., Zou, D., & Zhou, D. (2024). Multimode fault diagnosis datasets of gearbox under variable working conditions. Data in Brief, 54, 110453. doi: 10.1016/j.dib.2024.110453

Chen, S., Wu, Y., Wang, C., Liu, S., Tompkins, D., Chen, Z., ... Wei, F. (2023). BEATs: Audio pre-training with acoustic tokenizers. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 5178–5193).

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 4171–4186). doi: 10.18653/v1/N19-1423

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.

Gong, Y., Chung, Y.-A., & Glass, J. (2021). AST: Audio spectrogram transformer. In Proceedings of Interspeech (pp. 571–575). doi: 10.21437/Interspeech.2021-698

Gong, Y., Lai, C.-I., Chung, Y.-A., & Glass, J. (2022). SSAST: Self-supervised audio spectrogram transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10699–10709. doi: 10.1609/aaai.v36i10.21315

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 16000–16009). doi: 10.1109/CVPR52688.2022.01553

Hendriks, J., Dumond, P., & Knox, D. A. (2022). Towards better benchmarking using the CWRU bearing fault dataset. Mechanical Systems and Signal Processing, 169, 108732. doi: 10.1016/j.ymssp.2021.108732

Huang, H., & Baddour, N. (2019). Bearing vibration data under time-varying rotational speed conditions. doi: 10.17632/v43hmbwxpm.2

Huang, P.-Y., Xu, H., Li, J., Baevski, A., Auli, M., Galuba, W., ... Feichtenhofer, C. (2022). Masked autoencoders that listen. In Advances in Neural Information Processing Systems.

Jain, S., & Wallace, B. C. (2019). Attention is not explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 3543–3556). doi: 10.18653/v1/N19-1357

Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical Systems and Signal Processing, 20(7), 1483–1510. doi: 10.1016/j.ymssp.2005.09.012

Lai, Z., Yang, C., Lan, S., Wang, L., Shen, W., & Zhu, L. (2024). BearingFM: Towards a foundation model for bearing fault diagnosis by domain knowledge and contrastive learning. International Journal of Production Economics, 275, 109319. doi: 10.1016/j.ijpe.2024.109319

Lee, S., Kim, T., & Kim, T. (2024). Multi-domain vibration dataset with various bearing types under compound machine fault scenarios: Subset 1 — Deep groove ball bearing. doi: 10.17632/53vtnjy6c6.1

Lei, Y., Han, T., Wang, B., Li, N., Yan, T., & Yang, J. (2019). XJTU-SY rolling element bearing accelerated life test datasets: A tutorial. Journal of Mechanical Engineering. doi: 10.3901/JME.2019.16.001

Lessmeier, C., Kimotho, J. K., Zimmer, D., & Sextro, W. (2016). Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. PHM Society European Conference, 3(1). doi: 10.36001/phme.2016.v3i1.1577

Li, Y.-F., Wang, H., & Sun, M. (2024). ChatGPT-like large-scale foundation models for prognostics and health management: A survey and roadmaps. Reliability Engineering & System Safety, 243, 109850. doi: 10.1016/j.ress.2023.109850

Liefstingh, M., Taal, C., Echeverri Restrepo, S., & Azarfar, A. (2021). Interpretation of deep learning models in bearing fault diagnosis. In Annual Conference of the PHM Society (Vol. 13). doi: 10.36001/phmconf.2021.v13i1.3047

Luleå University of Technology. (2024). Vibration data from a gearbox output shaft bearing in a 2.5 MW wind turbine. Retrieved from https://researchdata.se/en/catalogue/dataset/2024-248

Lundström, A., & O’Nils, M. (2023). Factory-based vibration data for bearing-fault detection. Data, 8(7), 115. doi: 10.3390/data8070115

NASA Prognostics Data Repository, & IMS Center, University of Cincinnati. (2007). IMS bearings. Retrieved from https://catalog.data.gov/dataset/ims-bearings

National Renewable Energy Laboratory. (2014). Wind turbine gearbox condition monitoring: Vibration analysis benchmarking datasets. doi: 10.25984/1844194

Nectoux, P., Gouriveau, R., Medjaher, K., Ramasso, E., Chebel-Morello, B., Zerhouni, N., & Varnier, C. (2012). PRONOSTIA: An experimental platform for bearings accelerated degradation tests. In Proceedings of the IEEE International Conference on Prognostics and Health Management (PHM) (pp. 1–8).

Nguyen, T. (2023). HUST bearing. doi: 10.17632/cbv7jyx4p9.1

Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. doi: 10.1109/TKDE.2009.191

PHM Society, & NASA DASHlink. (2009). Gearbox fault detection dataset, PHM Data Challenge 2009. Retrieved from https://phmsociety.org/public-data-sets/

Randall, R. B. (2011). Vibration-based condition monitoring: Industrial, aerospace and automotive applications. John Wiley & Sons.

Schnur, C., Schneider, M., Zhang, Y., Berger, K., Schütze, S., Zou, J., ... Heimes, H. (2025). A machine learning dataset of artificial inner ring damages on cylindrical roller bearings measured under varying cross-influences. doi: 10.5281/zenodo.11108503

Sehri, M., & Dumond, P. (2023). University of Ottawa ball-bearing vibration and acoustic fault data under constant load and speed conditions (UODS-VAFDC). doi: 10.17632/y2px5tg92h.1

Serrano, S., & Smith, N. A. (2019). Is attention interpretable? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 2931–2941). doi: 10.18653/v1/P19-1282

Southeast University. (2024). SEU gearbox dataset. doi: 10.57702/wwr1jf6s

Tandon, N., & Choudhury, A. (1999). A review of vibration and acoustic measurement methods for the detection of defects in rolling element bearings. Tribology International, 32(8), 469–480. doi: 10.1016/S0301-679X(99)00077-8

University of New South Wales. (2020). Bearing run-to-failure datasets of UNSW. Mendeley Data. doi: 10.17632/h4df4mgrfb.3

van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605.

Wang, H., Liu, Z., Ge, Y., & Peng, D. (2022). Self-supervised signal representation learning for machinery fault diagnosis under limited annotation data. Knowledge-Based Systems, 239, 107978. doi: 10.1016/j.knosys.2021.107978

Xiao, Y., Shao, H., Yan, S., Wang, J., Peng, Y., & Liu, B. (2025). Domain generalization for rotating machinery fault diagnosis: A survey. Advanced Engineering Informatics, 64, 103063. doi: 10.1016/j.aei.2024.103063

Zhao, C., Zio, E., & Shen, W. (2024). Domain generalization for cross-domain fault diagnosis: An application-oriented perspective and a benchmark study. Reliability Engineering & System Safety, 245, 109964. doi: 10.1016/j.ress.2024.109964

Zhao, R., Yan, R., Chen, Z., Mao, K., Liu, P., & Gao, R. X. (2019). Deep learning and its applications to machine health monitoring. Mechanical Systems and Signal Processing, 115, 213–237. doi: 10.1016/j.ymssp.2018.05.050
Section
Technical Papers