Preventing The Consequences Of A Hidden Failure From Devastating Your Organization.

Ever wonder how some of the worst industrial disasters occur?  It is usually the result of multiple failures.  Failure of the primary system and failure of the protective systems.   Ensuring the protective system(s) are not in a failed state should be of utmost importance to any organization.  But how often should we test the protective systems to ensure the required availability?

Establishing the correct frequencies of the inspection/ testing activities of these protective system(s) is critical to not only the success but safety and reputation of any organization.   Too infrequently and the organization is at risk of a major incident.  Too frequently, and the organization is subjected to excess planned downtime, an increased probability of maintenance induced failures and increased maintenance cost.
This article will continue the discussion on establishing the correct inspection frequency in a maintenance program.  There are three different approached to use, based on the type of maintenance being performed;

This article will focus on Failure Finding Maintenance.

What Are Protective Systems, Hidden Failures and Failure Finding Maintenance

A protective system or device is a system or device which is designed to protect and mitigate or reduce the consequences of failure.  These consequences may be safety, environmental or operational in nature.   These devices or systems are designed to;

  • Alert – to potential problem conditions (i.e. alarm)
  • Relieve – prevent failure conditions causing greater problems (i.e. pressure relief valve)
  • Shutdown – stop a process to prevent greater problems from occurring (i.e. motor overload)
  • Mitigate – alleviate the consequences of a failure (i.e. fire suppression equipment)
  • Replace – continue to provide a function by an alternative means (i.e. back up pump)
  • Guard – prevent an accident from occurring  (i.e. E-Stop)

Knowing what a protective device or system is, you may see that if a pressure relief valve became corroded and seized in the closed position, it would not be evident to the operators.   This is a hidden failure.   A hidden failure can be defined as; a failure which may occur and not be evident to the operating crew under normal circumstances if it occurs on its own.  Obviously, this could lead to significant consequences if the tank that the pressure relief valve is protecting is overpressurized.   This is where failure finding maintenance comes in.

Failure-finding maintenance is a set of tasks designed to detect or predict failures in the protective systems or devices to reduce the likelihood of a failure in the protective system and the regular equipment from occurring at the same time.  So how to do you determine how often the protective systems should be checked for failure?  Establish the frequency using a formula.

Establishing Failure Finding Maintenance Frequencies Using Formulas

There is a single formula that will take into consideration of all variables to establish the failure finding interval (FFI);  FFI = (2 x MTIVE x MTED) /MMF


  • MTIVE = MTBF of the protective device or system
  • MTED = Mean Time Between Failure of the Protected Function
  • MMF =   Mean Time Between Multiple Failures

So if we use an example from RCM2, we can see how this works; The users of a pump and a standby pump want the following from the system.

  • The probability of a multiple failure to be less than 1 in 1000 in any one year (MMF)
  • The rate of unanticipated failures of the duty pump is 1 in 10 years (MTED)
  • The rate of unanticipated failure of the standby pump is 1 in 8 years (MTIVE)

Therefore the correct failure finding interval would be;

  • FFI = (2 x 8 x 10) / 1000
  • FFI = (160)/1000
  • FFI = 0.16 years
  • 0.16 years x 12 months = 2 months

This indicates that the standby pump must be checked every two months to verify it is fully operational.   If this check is not performed, the likelihood of a multiple failures increases.

Lastly, if the failure of the protective device can be caused by the failure finding task itself, there is another approach to be used, which is beyond the scope of this article.

Do you have a program in place to check your protective systems?  If not, are you aware of the risk that your organization is exposed to?   Take the time to determine your protective systems and establish your failure finding tasks.

Remember, to find success; you must first solve the problem, then achieve the implementation of the solution, and finally sustain winning results.

I’m James Kovacevic
Eruditio, LLC
Where Education Meets Application
Follow @EruditioLLC