Condition Monitoring & Predictive Maintenance: Cost-Benefit Tradeoffs

In a previous blog post, we discussed the basics of the Potential-Failure (P-F) curve, which refers to the interval between the detection of a potential failure and occurrence of a functional failure. In this post we’ll discuss the cost-benefit tradeoffs of various maintenance approaches.

In general, the goal is to maximize the P-F interval, which is the time between the first symptoms of impending failure and the functional failure taking place. In other words, you want to become aware of an impending failure as soon as possible to allow more time for action. This, however, must be balanced with the cost of the methods of prevention, inspection, and detection.

There is a trade-off between the cost of systems to detect and predict the failures and how soon you might detect the condition. Generally, the earlier the detection/prediction, the more expensive it is. However, the longer it takes to detect an impending failure (i.e. the more the asset’s condition degrades), the more expensive it is to repair it.Every asset will have a unique trade-off between the cost of failure prevention (detection/prediction) and the cost of failure. This means some assets probably call for earlier detection methods that come with higher prevention costs like condition monitoring and analytics systems due to the high cost to repair (see the Prevention-1 and Repair-1 curves in the Cost-Failure/Time chart). And some assets may be better suited for more cost-efficient but delayed detection or even a “run-to-failure” model due to lower cost to repair (the Prevention-2 and Repair-2 curves in the Cost-Failure/Time chart).


There are four basic Maintenance approaches:



The Reactive approach has low or even no cost to implement but can result in a high repair/failure cost because no action is taken until the asset has reached a fault state. This approach might be appropriate when the cost of monitoring systems is very high compared to the cost of repairing or replacing the asset. As a general guideline, the Reactive approach is not a good strategy for any critical and/or high value assets due to their high cost of a failure.

Reactive approaches:

      • Offer no visibility
      • Fix only if it breaks – low overall equipment effectiveness (OEE)
      • High downtime
      • Uncertainty of failures


The Preventative approach (maintenance at time-based intervals) may be appropriate when failures are age related and maintenance can be performed at regular intervals before anticipated failures occur. Two drawbacks to this approach are: 1) the cost and time of preventative maintenance can be high; and 2) studies show that only 18% of failures are age related (source: ARC Advisory Group). 82% of failures are “random” due to improper design/installation, operator error, quality issues, machine overuse, etc. This means that taking the Preventative approach may be spending time and money on unnecessary work, and it may not prevent expensive failures in critical or high value assets.

Preventative approaches:

      • Scheduled tune ups
      • Higher equipment longevity
      • Reduced downtime compared to reactive mode


The Condition-Based approach attempts to address failures regardless of whether they are age-based or random. Assets are monitored for one or more potential failure indicators, such as vibration, temperature, current/voltage, pressure, etc. The data is often sent to a PLC, local HMI, special processor, or the cloud through an edge gateway. Predefined limits are set and alerts (alarm, operator message, maintenance/repair) are only sent when a limit is reached. This approach avoids unnecessary maintenance and can give warning before a failure occurs. Condition-based monitoring can be very cost-effective, though very sophisticated solutions can be expensive. It is a good solution when the cost of failure is medium or high and known indicators provide a reliable warning of impending failure.

Condition-based approaches:

      • Based on condition (PdM)
      • Enables predictive maintenance
      • Improves OEE, equipment longevity
      • Drastically reduces unplanned downtime

Predictive Analytics

Predictive Analytics is the most sophisticated approach and attempts to learn from machine performance to predict failures. It utilizes data gathered through Condition Monitoring, and then applies analysis or AI/Machine Learning to uncover patterns to predict failures before they occur. The hardware and software to implement Predictive Analytics can be expensive, and this method is best for high-value/critical assets and expensive potential failures.

Predictive Analytics approaches:

      • Based on patterns – stored information
      • Based on machine learning
      • Improves OEE, equipment longevity
      • Avoids downtime

Each user will need to evaluate the unique attributes of their assets and decide on the best approach and trade-offs of the cost of prevention (detection of potential failure) against the cost of repair/failure. In general, a Reactive approach is only best when the cost of failure is very low. Preventative maintenance may be appropriate when failures are clearly age-related. And advanced approaches such as Condition Based monitoring and Predictive Analytics are best when the cost of repair or failure is high.

Also note that technology providers are continually improving condition monitoring and predictive solutions. By lowering condition monitoring system costs and making them easier to set up and use,  users can cost-effectively move from Reactive or Preventative approaches to Condition-Based or Predictive approaches.

Identify Failures Before They Happen: The PF Curve

The P-F curve is often mentioned in condition monitoring and predictive maintenance discussions. “P-F” refers to the interval between the detection of a potential failure (P) and the occurrence of a functional failure (F).The P-F curve is an illustrative generalization of what happens to an asset, machine or component as it ages, degrades, and eventually fails. It shows the different stages of an asset’s life, how machine failures progress, and how and when different symptoms emerge which might signal impending (or actual) failure.

The time scale in Fig. 1 is obviously exaggerated, and most assets operate for a lengthy period of time before failure starts to occur. The steepness of the failure portion of the curve can vary from asset to asset, but it generally follows the same pattern as shown in the diagram.

At first, performance degradation is minor and may not require significant action. As time progresses, the potential failure indicators become stronger and more easily detectable and the performance degradation becomes more severe, eventually ending in catastrophic failure.

The timeline is split into three domains:

      • Proactive domain – the failure is relatively far off (machine may still be new). Proactive activities include designing for reliability, precision installation & alignment and life cycle asset management. These can significantly extend the time until potential and functional failures occur.
      • Predictive domain – the failure may still be far off, but symptoms are emerging and offer (relatively) early warning signs. Timely action may be taken to prevent failure or replace failing equipment before catastrophic failure occurs.
      • Fault domain – the failure is occurring or inevitable, and symptoms indicate immediate action is needed to address the failure.

During these domains, different indicators/symptoms emerge. Ultrasonic, vibration and oil analysis often signal problems early; then temperature rise and noise emerge a bit later; and finally, parts come loose and more severe damage occurs. Depending on the asset, other indicators may be shown by activities including corrosion monitoring, motor current/power analysis and process parameter trending (e.g., flows, rates, pressures, temperatures, etc.).

By analyzing which symptoms of failure are likely to appear in the predictive domain for a given piece of equipment, you can determine which failure indicators to prioritize in your own condition monitoring and predictive maintenance discussions.

Click here to read more about condition monitoring.

HALT…The Quest for Reliability


It’s a given that everything man-made can potentially fail at some point during its useful lifetime.  Designers and users of equipment would very much like to predict how long something is likely to last, how frequently failures can be expected, and what application conditions can lead to excessive failure rates.

One traditional measure of reliability is MTTF or Mean Time To Failure.  In the case of electronics, it’s a calculated number based on the failure rates of individual electronic components that make up the complete assembly.

There’s a growing trend toward determining reliability through physical testing.   One such method is called HALT or Highly Accelerated Life Test.  The goal of HALT is to subject electronic products to extreme conditions that will induce premature aging and stress to uncover weaknesses in the design or components.  These weaknesses can then be designed out of the product during product development, before they ever reach the end-user application.

Continue reading “HALT…The Quest for Reliability”