Improve your failure data to improve the speed and accuracy of your failure & reliability analysis.

24254846213_6b6950bbbd_mA few years into your reliability journey, you start to struggle to make the improvements you were able to when you first started.  Why is this?  You were able to systematically eliminate all of the low hanging fruit using the existing data in your CMMS.  But now you have to dig deeper to realize the improvements and that requires better data.

As Fred discussed in the previous post, A Mean Cumulative Failure Analysis can be another powerful tool in your reliability toolbox, without requiring extensive failure data.  As with all tools, a Mean Cumulative Failure Analysis has applications that it can excel in, and those it can’t. This is why we have focused on providing you with a variety of tools you can use.

Using these tools does require failure data, and as you have seen, you already have the failure data in your system.  However, there is an easier way to perform these analyses and utilized more advanced analysis.  It requires better failure data.  Thankfully, if you setup your CMMS correctly, collecting the better failure data does not have to be an issue.

Standardizing Failure Data

Standardizing failure data begins with the standardization of equipment data.  By standardizing equipment data, you can define the components and related problems and causes that are specific to each type of equipment.  This is the first step in codifying the failure data (more on that later).   Standardizing equipment data involves many different aspects such as;

  • Equipment Taxonomy is how the equipment is organized in a hierarchal structure and how it is named.
  • Equipment Classes is the high-level grouping of like equipment, such as pumps.
  • Equipment Types: is a refinement of the class; for the class of pump, you may have centrifugal, positive displacement, etc.
  • Equipment Boundaries: defines what is included in a piece of equipment and what is not.
  • Equipment Attributes are the unique attributes of the equipment, such as manufacturer, model, performance specifications.
  • Equipment Operating Parameters is how the equipment is used in the organization.  Operating parameters include criticality, location, etc.
  • Equipment Components, Problems & Causes: include all relevant failure information for the specific equipment.

Thankfully, there is a standard to assist with this.  ISO 14224 – Collection and exchange of reliability and maintenance data for equipment is an ISO standard which works to standardized all Maintenance & Reliability data.  Although developed for the Oil & Gas industry, it has been successfully applied in almost all industries.

The next step to standardizing failure data is to define what a failure is and what data needs to be recorded when a failure occurs.  Defining a failure should be based on the work from RCM2.  A failure should be defined as the inability for the equipment to meet the performance requirements for the primary and secondary functions of the equipment.   Once this occurs, the failure data should be recorded, which should include;

  • Failure Date & Time
  • Equipment / Subunit affected
  • Failure Mode: the individual component that failure, the problem observed and the cause of the failure
  • Failure Consequence is the impact on operations or environment, health or safety.  This may include complete system shutdown, environmental discharge, etc.
  • Detection Method: how the failure was found, such as during periodic maintenance, during a functional test, etc.
  • Operating Condition at Failure: what was the equipment doing when it failed, such as starting up, shutting down or operating.
  • Date & Time of Equipment being restored to an operational state.

All of the failure data requirements are covered in detail in the ISO 14224 standard.

Making it Easy

One of the best things that can be done to ensure quick, accurate data entry is to codify the inputs.  So what is codification? Codification involves designating a unique code for all of the failure data.  This eliminates the need to data mine by reading comments and extracting the data manually.  Codes may resemble;

  • Component (Object): Bearing – Code: O0002
  • Problem: Seized – Code: P0038
  • Cause: Lubrication – Code: C0003
  • Detection Method: Functional Testing – Code: D0002
  • Activity: Adjust – Code A0004

Now the number of codes can quickly grow out of control so it is vital that the failure coding is set up to accommodate the vast amount of equipment at the site while keeping the codes to a minimum.  To minimize this, the codes are built using a defined structure to provide the right level of granularity, without over complicating the process.

  • All Components (Objects) should be listed in a single list starting with “O” for the object.  These are the individual components that make up each sub-assembly.  Start by looking at the individual components listed in your FMEAs or RCM to ensure they are both aligned.
  • Problems should be listed in a single list starting with P.  These problems codes could be streamlined to 6 major categories, with 38 codes.  This is available in the failure mechanism table in ISO 14224.
  • Causes should be listed in a list starting with C.  The causes could be streamlined to 5 major categories, with 20 unique codes.  This is available in the failure cause table in ISO 14224.
  • Detection Method should be listed in a list starting with D.  These detection methods can be summarized to 10 codes.  These are also available in ISO 14224.
  • Activity should be listed in a list starting with A.  The unique activity types are limited to 12 and can be found in ISO 14224.

However, you must ensure that when setting up the codes, that you keep a few key points in mind;

  • Mutually Exclusive: Codes should be mutually exclusive which means there should only be one appropriate code in each list.
  • KISS: While the list of codes may be extensive, be sure to only display codes that are relevant to the specific equipment that the data is being reported  The codes should fit an on a single list that does not require  This can be accomplished in most CMMS’ by linking the Components to the equipment class and type.

Using this structured system and keeping these few items in mind will ensure that the system remains easy to use while ensuring quality data is provided for the analysis.   Before rolling out the failure codes, have your staff review and work with the system.  This will ensure all codes are aligned, nothing is missing and it is easy to use.

Roll It Out

With the hard work completed, set a date to roll out the new failure coding and standards.  During the first week, there may be a need to support the staff, even with the previous training.  Be sure to set aside time to directly support the team with it.  It is during this initial phase that the staff will decide to support the new failure data or not.

After a few weeks, review the new data and see if it is meeting the expectations.  If not, determine why and adjust.   Armed with this new and improved failure data, you can now take your reliability efforts to the next level.

In the next post, Fred Schenkelberg will provide an overview of additional analysis methods available in the reliability engineering discipline and how they can be used to deliver benefits to your organization.

Remember, to find success, you must first solve the problem, then achieve the implementation of the solution, and finally sustain winning results.

I’m James Kovacevic
Eruditio, LLC
Where Education Meets Application


Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this article, consider subscribing to the ongoing series at Accendo Reliability.

The other articles in the series include:
Post 1 – Using the Maintenance Data You Already Have
Post 2 – The What & More Importantly, The Why of the Weibull Analysis
Post 3 – Quantify the Improvements with a Crow-AMSAA (or RGA)
Post 4 – Using a Mean Cumulative Plot
Post 5 – The Next Step in Data
Post 6 – The Next Step in Your Data Analysis
Post 7 – Data Q&A with Fred & James

Fred Schenkelberg
FMS Reliability
Accendo Reliability
ISO 14224
The Basics of FMEA [/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]