Rethinking Reliability in Terms of Margins



Published Oct 26, 2023
Diego Mandelli Congjian Wang Koushik Manjunatha Vivek Agarwal Linyu Lin


Current reliability approaches were designed to assess and quantify the reliability associated with complex systems such as nuclear power plants (NPPs). These approaches are generally based on classical Boolean logic structures such as event trees (ETs) and fault trees (FTs) [Rausand, 2020]. The outcome obtained by combining FTs and ETs is the set of minimal cuts sets (MCSs), with each MCS representing a unique combination of BEs that leads to an undesired outcome (e.g., core damage). Probabilistic evaluation of a MCS is performed by evaluating the product of the probability values associated with each BE. A relevant factor here is that the probability values associated with BEs used in the plant models are updated at least every 4 years based on past operational experience and through the use of a Bayesian statistical process [Siu, 1998]. Hence, the probability value of a BE associated with a physical asset (e.g., a centrifugal pump or motor-operated valve) in no way reflects that asset’s actual condition and performance.

This fact plays a major role in the application of plant reliability models to support risk-informed decisions. With the particular goal of reducing operation and maintenance costs, existing NPPs are moving from corrective and periodic maintenance toward new types of predictive maintenance strategies [Agarwal, 2021]. This transition is designed such that maintenance is conducted only when the asset requires it (i.e., prior to undergoing imminent failure). And though these benefits cannot be achieved through actual reliability modelling methods and currently employed reliability data, they can be achieved by employing asset-monitoring sensors, automated data acquisition systems, data analysis methods, and improved decision-making processes. Combined, these resources can provide precise information on the health of an asset, track its degradation trends, and estimate its expected failure time. Based on such information, maintenance operations can be scheduled and performed for each asset on an as-needed basis. This dynamic context of predictive maintenance operations requires new methods of data analysis, the propagation of asset health information from the asset level to the system level, and the optimization of plant resources.

This paper provides an alternative reliability approach designed for a predictive maintenance context in which a direct link is created between monitoring data and decision-making. Rather than thinking of reliability in terms of system/asset probability of failure, we propose a reliability mindset based on the concept of margin [Mandelli, 2023]. An asset’s health is quantified by determining its margin, based on the asset’s current and historical monitoring data. The margin values of the monitored asset are then propagated through system reliability models (e.g., FTs or reliability block diagrams) to identify the assets that are more critical to guarantee system operation. We show how a margin-based approach can be used assess asset health, based solely on current and historic monitoring data (e.g., condition-based, anomaly detection, diagnostic, and prognostic data) [Xingang, 2021]. A margin-based approach directly addresses the limitations of classical reliability modelling approaches and provides a snapshot of system health—given the availability of monitoring data. These two different approaches are designed to address different types of decisions: classical reliability models support static decisions (e.g., a set frequency of periodic maintenance or surveillance operations) based on past operational experience, whereas a margin-based approach directly supports dynamic decisions involving maintenance operations that should only be performed when necessary, based on monitoring data (i.e., a predictive maintenance context).

How to Cite

Mandelli, D., Wang, C., Manjunatha, K., Agarwal, V., & Lin, L. (2023). Rethinking Reliability in Terms of Margins. Annual Conference of the PHM Society, 15(1).
Abstract 164 | Paper (PDF) Downloads 154 Slides (PDF) Downloads 67



reliability, decision making

Agarwal, V. Araseethota Manjunatha, K., Smith, J. A., Gribok, A. V., Yadav, V., Palas, H., Yarlett, M., Goss, N., Yurkovich, S., Diggans, B., Lybeck, N. J., Pennington, M., & Zwiryk, N. (2021a). Machine learning and economic models to enable risk-informed condition based maintenance of a nuclear plant asset. Idaho National Laboratory Technical Report, INL/EXT 21-61984.

Lewis, A., Groth, K. M. (2022). “Metrics for evaluating the performance of complex engineering system health monitoring models, Reliability Engineering & System Safety, vol. 223.

Hjartarson, T., Shawn, O. \ (2006). “Predicting Future Asset Condition Based on Current Health Index and Maintenance Level,” in ESMO 2006–2006 IEEE 11th International Conference on Transmission & Distribution Construction, Operation and Live-Line Maintenance.

Mandelli, D., Wang, C., & Hess S. (2023). On the Language of Reliability: A System Engineer Perspective. Nuclear Technology.

Melchers, R., Beck, A. (2018). Structural Reliability Analysis and Prediction. Wiley ed., 3rd edition.
Pinciroli, L., Baraldi, P., Zio, E. (2023). “Maintenance optimization in industry 4.0,” Reliability Engineering & System Safety, vol. 234.

Rausand, M., Barros, A., & Hoyland, A. (2020). System Reliability Theory: Models, Statistical Methods, and Applications. Wiley.

Siu, N., & Kelly D. (1998). Bayesian Parameter Estimation in Probabilistic Risk Assessment. Reliability Engineering and System Safety, 62 (1–2), pp. 89-116.

Xingang, Z., Kim, J., Warns, K., Wang, X., Ramuhalli, P., Cetiner, S., Kang, H. G., & Golay, M. (2021). Prognostics and Health Management in Nuclear Power Plants: An Updated Method-Centric Review with Special Focus on Data-Driven Methods. Frontiers in Energy Research, 9, 696785.

Zio, E. (2022). “Prognostics and Health Management (PHM): Where are we and where do we (need to) go in theory and practice,” Reliability Engineering & System Safety, vol. 218.
Technical Research Papers