Using Failure Data to Drive Sustainable Improvements

If you are lucky enough to have good failure data history in your CMMS, you are one of the few.  But even if you have the data, can you use it to make a difference to your organization?  Obviously, the data can be used to perform certain reliability engineering analyses, but what can those without reliability engineering experience do with the data?

Bad Actor / Pareto Analysis

Two simple analysis that anyone can use with their failure history are;

  • A Bad Actor analysis is identifying equipment that is experiencing repetitive failures.  The Bad Actor analysis focuses on the frequency of failure only.
  • Pareto Analysis is an analysis that identifies what equipment are contributing to any key factor such as unplanned downtime, total downtime, the number of failures, the cost of maintenance, the cost of lost production, etc.

Either of these analyses can be used based on the failure data collected.  A unique way of using these analyses is instead of using them to look at equipment, use them for the failure mode (which is the object code + the damage code + the cause code).   By using the analysis with the failure modes, you may be able to identify underlying issues that are impacting your entire operation.   For example, if you find that Bearing Overheating Improper Lubrication is the most prevalent failure mode, then look at the lubrication program across the site.  It may be that the bearings are not being lubricated or are being over lubricated.

By taking this approach, addressing an issue on one asset can impact many assets across the site. However, while these analysis techniques can be used to identify issues and causes, they may not be enough to validate the effectiveness of your maintenance strategy.

Linking Failures to Maintenance Strategies

Over the past year or two, there has been some discussion around linking the failure codes from your CMMS to the maintenance strategy development technique (RCM, FMEA, MTA, PMO, RCM Blitz).  Many software packages do not enable this linkage to be performed.   I have heard of a few approaches, but not having done this before (although I fully see the value), I decided to build out a way to do this without any special software, that is simple enough for any organization to use.  By utilizing this approach, any organization can not only look to see if their Maintenance Strategy is effective but to also identify the gap in it.

Codifying the Maintenance Activities

The first step in being able to link the failure codes to a maintenance activity is to go through all of the maintenance activities and codify the failure mode each task is addressing.  For example, a PM activity to take a measurement of a conveyor slate is in place to monitor the wear on the conveyor.   Looking at this task, the failure mode it is trying to address is Conveyor Slate Worn   Normal Wear.

This activity needs to be completed for each specific task covered in the RCM, FMEA, or other analysis.   Ideally, a column would be added to the analysis to hold this Failure Mode.

An additional benefit to performing this activity is that it will identify gaps in the failure coding (master library).  By updating the master library, more accurate data will be provided, and there will be less “other” or free text codes provided by the frontline staff.

Comparing Failure Data to Maintenance Activities

With all of the maintenance activities codified, the comparison can now begin.  When comparing the failure data to the maintenance activities, start at the asset level.  Trying to do this across many assets or areas can be overwhelming, so start small.   There are a few ways in which the data can be compared;

  • Maintenance Activities Not Present (MANP) – Identify any failure modes from the failure data that are not present in the maintenance activities’ codes.    MANP identifies a gap in the maintenance strategy.
  • Maintenance Activities Not Frequent (MANF) – Identify any frequent failure modes from the failure data that are not very frequent in the maintenance activities codes.  MANF identifies a potential gap in the maintenance strategy.
  • Failure Data Not Present (FDNP) – Identify any failure modes in the maintenance activities codes that are not present in the failure data.  FDNP identifies over maintaining equipment and wasted resources.
  • Failure Data Not Frequent (FDNF) – Identify any frequent failure modes in the maintenance activities’ codes that are not frequent in the failure data.  FDNF identified a potential over maintaining equipment and potentially wasted resources.

These four relationships can provide some great insights to not only the effectiveness of the maintenance strategy but also the efficiency of it.

Addressing the Gaps Between Failure Data & Maintenance Activities

With the comparison complete, the gaps can start to be addressed.  As with all analysis, start addressing the gaps which would provide the greatest return for the least amount of input.   Each of the gaps above can be addressed;

  • MANP – Perform an RCM, FMEA, etc. analysis to address the missing activities via maintenance activities or other activities such as redesign.
  • MANF – Review the RCM, FMEA, etc. analysis to identify potential gaps or opportunities to improve the existing maintenance strategy based on the frequent failure data failure modes.
  • FDNP – Review all follow-up work generated from PMs, PdMs, etc.  to see if the specific failure modes have been caught before they resulted in a functional failure.  If the maintenance strategy is not generating follow-up work, perform a risk analysis to remove the maintenance activity.  If it is generating follow-up work, perform a Weibull analysis to determine the optimum frequency of the maintenance activity
  • FDNF – Review all follow-up work generated from PMs, PdMs, etc.  to see if the specific failure modes have been caught before they resulted in a functional failure.  If the maintenance strategy is generating the occasional follow-up work, perform a Weibull analysis to determine the optimum frequency of the maintenance activity.

By linking the failure data to the maintenance activities, any organization can determine how effective and efficient their maintenance program is.  This will allow the organization to achieve the balance between asset performance, cost, and risk.

On a side note, be sure to track any changes made to the maintenance program based on this analysis.  Identify the cost savings attributed to a more effective (avoid unplanned downtime) or a more efficient (reduction in PM workload) maintenance program.  You might just be surprised at the return on this analysis.

In Summary to link the failure data to maintenance activities you must;

  1. Codify the maintenance activities
  2. Compare failure data to maintenance activities
  3. Identify gaps between the failure data and maintenance activities
  4. Address the gaps between the failure data and maintenance activities
  5. Perform the accepted analysis (RCM, FMEA, etc.) to address any missing failure modes from the analysis
  6. Update the master library with any missing failure codes
  7. Update maintenance activities with findings
  8. Repeat!

Do you link your failure data to your maintenance activities?  How do you perform the linkage and analysis?   If you don’t current link the two together, how do identify gaps in your maintenance program?

Remember, to find success; you must first solve the problem, then achieve the implementation of the solution, and finally sustain winning results.

I’m James Kovacevic
Eruditio, LLC
Where Education Meets Application
Follow @EruditioLLC

References;