Condition Monitoring & Predictive Maintenance: Machine Failure Indicators & Detection Methods

In our previous blogs, we discussed the basics of the P-F (Potential – Functional Failure) curve and the cost-benefit tradeoffs of various maintenance approaches. We’ll now describe the measures that can be taken to discover failure indicators along the P-F curve.The basic concept of the P-F curve is that as a machine or asset deteriorates, various symptoms/indicators emerge. The early-stage indicators may be harder to detect and may require more sophisticated and expensive systems to analyze, but they give you more time to take action to prevent a catastrophic failure. They allow users to choose times to service a machine when it’s less disruptive to the manufacturing process and when only minor maintenance actions, such as changing lubricant, replacing a filter or balancing a fan, are needed rather than major parts repair/replacement. The later-stage indicators may be more obvious and simpler to notice, but they may require extensive and expensive maintenance since greater deterioration has taken place.

Some monitoring methods can be done on a continual basis by using a permanently mounted sensor that takes samples at intervals of once an hour or more often. Others can only be done on a one-time or periodic basis, as when a sensor is brought in for special analysis, perhaps once a month or less often.

Common indicators and detection methods

This version of the P-F curve lists several common indicators and detection methods, in rough order of when they might start to reveal deterioration in an asset:

    • Ultrasonic Spike Energy. Ultrasonic condition monitoring sensors are often expensive and used in portable systems to take one-time readings, but they can provide very early potential failure detection.
    • Vibration Analysis. Sensors and evaluation tools can range from very simple and low cost to sophisticated and expensive. The vibration analysis is done on either a one-time, periodic, or continual basis and often gives an early insight into emerging problems.
    • Oil Analysis. An oil analysis may signal the need for additional, relatively simple maintenance actions, such as lubricating bearings, changing lubricant, or scheduling maintenance. This can usually be done on a one-time basis, but perhaps periodically, such as monthly or annually.
    • Temperature Analysis. This analysis can indicate emerging “hot spots” on a machine, such as bad bearings or excessive friction that signal a future failure. Depending on the measurement system and asset, it can be an early or a late indicator of impending failure.
    • Pressure & Flow. These indicators can fall into either the predictive or the fault domain, depending on implementation. If a proactive approach is taken, they might be condition indicators that can provide an early indication of potential failure; if a reactive approach is taken, they might be indicators of a functional failure (failure already occurring).
    • Audible Noise. Noise is often an indicator of deterioration moving into the fault domain, and requiring more immediate action than vibration, temperature, or ultrasonic indicators.
    • Hot to Touch. Generally, once bearings, motors, or shafts become hot to the touch, failure is imminent and quick action is needed to avoid catastrophic failure.
    • Mechanically Loose. This indicator may fall into preventative maintenance (maintenance performed at time-based intervals rather than based on need) and may not catch impending failures until it is too late. Parts, which are obviously loose, can indicate a deeper problem, often close to failure.
    • Ancillary Damage. This detects when other parts of the machine/assets are being damaged prior to a catastrophic failure (for example, a damaged belt due to belt misalignment caused by a failing bearing). Generally, when this is found, it is too late to prevent the failure of the asset.

This list does not cover all possible indicators. Machine users and builders may have others depending on their unique application – other potential methods to detect asset deterioration include monitoring of current, corrosion, or leaks.

The “best” indicator and approach will depend on each user’s and each asset’s unique risk/cost/benefit profile. Machine builders and users should work closely with an experienced condition monitoring solution provider who provides multiple solutions to help consider and assess the tradeoffs associated with various approaches.

Condition Monitoring & Predictive Maintenance: Cost-Benefit Tradeoffs

In a previous blog post, we discussed the basics of the Potential-Failure (P-F) curve, which refers to the interval between the detection of a potential failure and occurrence of a functional failure. In this post we’ll discuss the cost-benefit tradeoffs of various maintenance approaches.

In general, the goal is to maximize the P-F interval, which is the time between the first symptoms of impending failure and the functional failure taking place. In other words, you want to become aware of an impending failure as soon as possible to allow more time for action. This, however, must be balanced with the cost of the methods of prevention, inspection, and detection.

There is a trade-off between the cost of systems to detect and predict the failures and how soon you might detect the condition. Generally, the earlier the detection/prediction, the more expensive it is. However, the longer it takes to detect an impending failure (i.e. the more the asset’s condition degrades), the more expensive it is to repair it.Every asset will have a unique trade-off between the cost of failure prevention (detection/prediction) and the cost of failure. This means some assets probably call for earlier detection methods that come with higher prevention costs like condition monitoring and analytics systems due to the high cost to repair (see the Prevention-1 and Repair-1 curves in the Cost-Failure/Time chart). And some assets may be better suited for more cost-efficient but delayed detection or even a “run-to-failure” model due to lower cost to repair (the Prevention-2 and Repair-2 curves in the Cost-Failure/Time chart).

 

There are four basic Maintenance approaches:

:

Reactive

The Reactive approach has low or even no cost to implement but can result in a high repair/failure cost because no action is taken until the asset has reached a fault state. This approach might be appropriate when the cost of monitoring systems is very high compared to the cost of repairing or replacing the asset. As a general guideline, the Reactive approach is not a good strategy for any critical and/or high value assets due to their high cost of a failure.

Reactive approaches:

      • Offer no visibility
      • Fix only if it breaks – low overall equipment effectiveness (OEE)
      • High downtime
      • Uncertainty of failures

Preventative

The Preventative approach (maintenance at time-based intervals) may be appropriate when failures are age related and maintenance can be performed at regular intervals before anticipated failures occur. Two drawbacks to this approach are: 1) the cost and time of preventative maintenance can be high; and 2) studies show that only 18% of failures are age related (source: ARC Advisory Group). 82% of failures are “random” due to improper design/installation, operator error, quality issues, machine overuse, etc. This means that taking the Preventative approach may be spending time and money on unnecessary work, and it may not prevent expensive failures in critical or high value assets.

Preventative approaches:

      • Scheduled tune ups
      • Higher equipment longevity
      • Reduced downtime compared to reactive mode

Condition-Based

The Condition-Based approach attempts to address failures regardless of whether they are age-based or random. Assets are monitored for one or more potential failure indicators, such as vibration, temperature, current/voltage, pressure, etc. The data is often sent to a PLC, local HMI, special processor, or the cloud through an edge gateway. Predefined limits are set and alerts (alarm, operator message, maintenance/repair) are only sent when a limit is reached. This approach avoids unnecessary maintenance and can give warning before a failure occurs. Condition-based monitoring can be very cost-effective, though very sophisticated solutions can be expensive. It is a good solution when the cost of failure is medium or high and known indicators provide a reliable warning of impending failure.

Condition-based approaches:

      • Based on condition (PdM)
      • Enables predictive maintenance
      • Improves OEE, equipment longevity
      • Drastically reduces unplanned downtime

Predictive Analytics

Predictive Analytics is the most sophisticated approach and attempts to learn from machine performance to predict failures. It utilizes data gathered through Condition Monitoring, and then applies analysis or AI/Machine Learning to uncover patterns to predict failures before they occur. The hardware and software to implement Predictive Analytics can be expensive, and this method is best for high-value/critical assets and expensive potential failures.

Predictive Analytics approaches:

      • Based on patterns – stored information
      • Based on machine learning
      • Improves OEE, equipment longevity
      • Avoids downtime

Each user will need to evaluate the unique attributes of their assets and decide on the best approach and trade-offs of the cost of prevention (detection of potential failure) against the cost of repair/failure. In general, a Reactive approach is only best when the cost of failure is very low. Preventative maintenance may be appropriate when failures are clearly age-related. And advanced approaches such as Condition Based monitoring and Predictive Analytics are best when the cost of repair or failure is high.

Also note that technology providers are continually improving condition monitoring and predictive solutions. By lowering condition monitoring system costs and making them easier to set up and use,  users can cost-effectively move from Reactive or Preventative approaches to Condition-Based or Predictive approaches.

Identify Failures Before They Happen: The PF Curve

The P-F curve is often mentioned in condition monitoring and predictive maintenance discussions. “P-F” refers to the interval between the detection of a potential failure (P) and the occurrence of a functional failure (F).The P-F curve is an illustrative generalization of what happens to an asset, machine or component as it ages, degrades, and eventually fails. It shows the different stages of an asset’s life, how machine failures progress, and how and when different symptoms emerge which might signal impending (or actual) failure.

The time scale in Fig. 1 is obviously exaggerated, and most assets operate for a lengthy period of time before failure starts to occur. The steepness of the failure portion of the curve can vary from asset to asset, but it generally follows the same pattern as shown in the diagram.

At first, performance degradation is minor and may not require significant action. As time progresses, the potential failure indicators become stronger and more easily detectable and the performance degradation becomes more severe, eventually ending in catastrophic failure.

The timeline is split into three domains:

      • Proactive domain – the failure is relatively far off (machine may still be new). Proactive activities include designing for reliability, precision installation & alignment and life cycle asset management. These can significantly extend the time until potential and functional failures occur.
      • Predictive domain – the failure may still be far off, but symptoms are emerging and offer (relatively) early warning signs. Timely action may be taken to prevent failure or replace failing equipment before catastrophic failure occurs.
      • Fault domain – the failure is occurring or inevitable, and symptoms indicate immediate action is needed to address the failure.

During these domains, different indicators/symptoms emerge. Ultrasonic, vibration and oil analysis often signal problems early; then temperature rise and noise emerge a bit later; and finally, parts come loose and more severe damage occurs. Depending on the asset, other indicators may be shown by activities including corrosion monitoring, motor current/power analysis and process parameter trending (e.g., flows, rates, pressures, temperatures, etc.).

By analyzing which symptoms of failure are likely to appear in the predictive domain for a given piece of equipment, you can determine which failure indicators to prioritize in your own condition monitoring and predictive maintenance discussions.

Click here to read more about condition monitoring.