Simple Metrics for Evaluating and Conveying Prognostic Model Performance To Users With Varied Backgrounds



Published Oct 14, 2013
Michael E. Sharp


The need for standardized methods for comparison and evaluation of new models and algorithms has been known for nearly as long as there has been models and algorithms to evaluate. Conveying the results of these comparative algorithms to people not intimately familiar with the methods and systems can also present many challenges as nomenclature and relative representative values may vary from case to case. Many predictive models rely primarily on the minimization of simplistic error calculation techniques such as the Mean Squared Error (MSE) for their performance evaluation. This, however, may not provide the total necessary information when the criticality, or importance of a model’s predictions changes over time. Such is the case with prognostic models; predictions early in life can have relatively larger errors with lower impact on the operations of a system than a similar error near the end of life. For example, an error of 10 hours in the prediction of Remaining Useful Life (RUL) when the predicted value is 1000 hours is far less significant than when the predicted value is 25 hours. This temporality of prognostic predictions in relation to the query unit’s lifetime means that any evaluation metrics should capture and reflect this evolution of importance. This work briefly explores some of the existing metrics and algorithms for evaluation of prognostic models, and then offers a series of alternative metrics that provide clear and intuitive measures that fully represent the quality of the model performance on a scale that is independent of the application. This provides a method for relating performance to users and evaluators with a wide range of backgrounds and expertise without the need for specific knowledge of the system in question, helping to aid in collaboration and cross-field use of prognostic methodologies. Four primary evaluation metrics can be used to capture information regarding both timely precision and accuracy for any series or set of prognostic predictions of RUL. These metrics, the Weighted Error Bias, the Weighted Prediction Spread, the Confidence Interval Coverage, and the Confidence Convergence Horizon are all detailed in this work and are designed such that they can easily be combined into a single representative “score” of the overall performance of a prediction set and by extension, the prognostic model that produced it. Designed to be separately informative or used as a group, this set of performance evaluation metrics can be used to quickly compare different prognostic prediction sets not only for the same corresponding query set, but just as simply from differing query data sets by scaling all predictions and metrics to relative values based on the individual query cases.

How to Cite

E. Sharp, M. (2013). Simple Metrics for Evaluating and Conveying Prognostic Model Performance To Users With Varied Backgrounds. Annual Conference of the PHM Society, 5(1).
Abstract 184 | PDF Downloads 154




Banks, J., J.Merenich. “Cost Benefit Analysis for Asset Health Management Technology”. Reliability and Maintainablity Symposium (RAMS), Orlando, Florida. 2007

Coble, Jamie, “Merging Data Sources to Predict Remaining Useful Life – An Automated Method to Identify Prognostic Parameters,” Doctorial Dissertation, University of Tennessee, Knoxville TN. 2010

Leao, B.P., J.P.P.Gomes, R.K.H.Galvaro, and T.Yoneyama. “How to Tell the Good from The Bad in Failure Prognostics”. IEEE Aerospace Conference Proceedings. 2010

Orchard, M., G.Kacprzynski, K.Goebel, B.Saha, and G.V achtservanos. “Advances in Uncertainty Representation and Management for Particle Filtering Applied to Prognostics”. International Conference on Prognostics and Health Management, 2008.

Saxena, Abhinav, Jose Celaya, E. Balaban, B. Saha, S. Saha, and K. Goebel, “Metrics for evaluating performance of prognostic techniques”. International Conference on Prognostics and Health Management (PHM08), Denver CO, pp. 1- 17, 2008

Saxena, Abhinav, Jose Celaya, Bhaskar Saha, Sankalita Saha, and Kai Goebel. "On Applying the Prognostic Performance Metrics." Annual Conference of the Prognostics and Health Management Society (2009)

Saxena, Abhinav, Jose Celaya, Bhaskar Saha, Sankalita Saha, and Kai Goebel. “Metrics for Offline Evaluation of Prognostic Performance”.International Journal of Prognostics and Health Management. ISSN 2153-2648, 2010 001. April 2010.

Tang, Liang, Marcos E.Orchard, Kai Gobel, George Vachtevanos, “Novel Metrics for the Verification and Validation of Prognostic Algorithms”. Aerospace Conference 2011 IEEE, Big Sky, MT. 5 -12 March 2011.

Uckun,S., K.Goebel, and P.J.F.Lucus. “Standardizing Research Methods for Prognostics. International Conference on Prognostics and Health Management (PHM08). Denver CO. 2008
Technical Research Papers