Friday, July 29, 2011

Heinz Bloch Failure Codes as a Simple Start to Failure Coding

I get quite a few request for these simple failure codes for use in Enterprise Asset Management Systems (EAMS) and Computerized Maintenance Management Systems (CMMS) so I thought I would post them up for easy retrieval.

Remember failure codes are only as good as the understanding the user has of each code, so make sure to create training for your team and test for understanding prior to implementation.

If you use these examples as a failure coding system then please understand this is a first step and as your organization matures with failure reporting you may chose to go to a more in-depth  failure code structure that is based off of the Failure Modes Effects Analysis (FMEA) that you complete. This will provide a new level of detail for reliability engineers and maintenance engineers to use in there quest to investigate,eliminate, and mitigate failures.

These definitions are drawn from the Heinz Bloch Machinery Failure Analysis and Troubleshooting book third edition. We have taken them and added examples. They can be a good first step on the way to Failure Reporting Analysis and Corrective Action System (FRACAS) within your EAM or  CMMS.

Faulty Design (FD)
Faulty design is when the equipment is not adequate for the intended service. Generally this results in conditions (i.e. temperature, force, pressure, vibration, etc.) which were not expected from the design and exceed the equipment capabilities.

“In excess of 95 percent of fluid handling problems were either designed in or the result of a process change that did not account for its effects on all equipment.” Charles M. Boyles, Editor, Pumping Technology Magazine

Material Defects (MatD)
A material defect is a flaw in the material itself, such as a casting defect, or a non-uniform material (i.e. metal, elastomer, etc.) which results in a material property (i.e. strength, hardness, elasticity, chemical resistance, etc.) different than those expected.

The material defect category should not be used when the wrong material is used in a particular service. This would be the result of a faulty design if the material was designed to be used or a fabrication error if the correct material was called for, but not used.

Fabrication or Processing Errors (FPE)
A fabrication / processing error indicates that the equipment and / or parts are not made correctly (as designed) during initial fabrication or are improperly re-conditioned during a repair. These errors can be identified by verifying whether the components adhere to the original design (i.e. proper dimensions, fits and clearances, proper material, correct component balance, etc.).

Assembly or Installation Defects (AID)
An assembly / installation defect generally means that the equipment and/or parts may have been made correctly, but were not put together or installed correctly. This would include the assembly or repair of the equipment as well as field installation, which may result from improper or inadequate procedures or techniques. Assembly problems may include items such as improper bearing installation / handling procedures, incorrect orientation of grease shields, improper torque sequences, etc. Field installation problems may include items such as shaft misalignment, inadequate grouting, piping strain, etc.

Off-design or Unintended Service Conditions (USC)
An off-design / unintended service condition usually means that the design was adequate for the original service, but changes to the service have now made the design inadequate for the new service conditions. This can occur when changes have been made intentionally to the operating (service) conditions creating an unintended effect on the equipment (condition the equipment was not designed for). Examples may include a throughput change that causes cavitation or a temperature change that affects a material strength.

This failure category can occur when plant capacity is increased without a sufficient review of the effect on equipment. “We have found that a large percentage, 27%, of chronic machine and system problems is the direct result of operating production systems outside their acceptable design range.” - Keith Mobley, President, Integrated Systems, Inc

Examples: Equipment damage due to external acts such as lightning and other acts of nature as well as damage due to mobile equipment operator carelessness and intentional management changes to the SOPs.

Maintenance Deficiencies (MD)
A maintenance deficiency indicates that the equipment is not maintained correctly after it is designed and installed. This may include not replenishing / changing lubricants, not changing filters when they plug, allowing contamination to go untended (i.e. water or dirt in oil), or using inadequate procedures to perform such tasks (i.e. not removing the drain plug when re-greasing bearings).

The maintenance deficiency category would not normally be used to classify a problem that results from an improper repair. That type of defect would normally be the result of either a fabrication / processing error or an assembly / installation defect.

Improper Operation (IO)
As the name implies, improper operation results when the equipment is not operated correctly. This usually occurs with unintended changes to the equipment operating conditions. Examples would include such things as running a pump dry, deadheading a pump for prolonged periods of time, operating in a critical frequency for prolonged periods, as well as improper operating procedures (i.e. not venting a seal chamber to remove gas prior to commissioning).

Improper operation is often associated with upset conditions in the system or transient type conditions such as start-up and shutdown.

Examples: Incorrect or poorly written SOPs

Monday, July 25, 2011

Triggering Success with Root Cause Analysis

Many Root Cause Analysis (RCA) implementations fail because of one simple missing element, the triggers. These process gates control the flow of RCA investigations through the problem solving process. If they are nonexistent then you may have too many RCA investigations to handle; and if they are set up improperly you could end up with too little or too many investigations.
Triggers may be a certain number of hours of downtime, a certain level of cost, or safety implications. These triggers decide both when an investigation needs to occur and to what level. For example, for an issue of downtime that equals one hour, you may involve a maintenance or reliability engineer, and he may interview those involved and create a simple style A3 report. On the other side, if you had fourteen hours of downtime, you may involve a team of folks including engineers, operators, maintainers, supervisors, and others as required. With the more in-depth investigations, your triggers may suggest that you use additional tools and more complex documentation. All of this is determined when you build your RCA process flows.
The problem with missing the balance is two extremes. The most common is that there is an influx of investigations, so many in fact that your RCA team members and facilitators cannot keep up. They spend all their time rushing through investigations, which leads to reports that have limited value and are completed to “check a box” and action items that are never followed up on or implemented. There is no return on investment for Root Cause Analysis until the action items have been completed, and the exercise is not over until the expected results have been proven. The second issue is when the triggers are set too high and there are so few RCA investigations that your team forgets the methods and processes for the investigation. This leads to a loss of skill and a loss of results. One or two RCAs per year will get you to that low level of performance ever time.
Leaders in organizations that are starting up or restarting RCA efforts need to be especially diligent to insure that they follow the process. It is easy to become eager for results and ask for RCAs to be done when they fall outside of the ranges set in the triggers. This leads to the same overload as triggers that are set too low.
As the organization matures in the use of RCA problem solving, the triggers will move to new levels. This is simply because the participants will get better at using the tools and processes and will be able to do more investigations in less time. When this occurs, you lower your triggers to allow more investigations into the process. At this point, you are now investigating and resolving more root causes in the same amount of time.
Keep in mind, RCA is driven by the implementations of the report findings and the follow-up actions. Or to say it a different way, RCA is all about results not reports. If you set your triggers right, and stick with your process, you will see those results grow. 
Email me with questions at

Monday, July 18, 2011

Hot Rods and Hand Tools: Where have the skilled trades gone and how do we deal with their absence?

Below is a portion of an article that I will be presenting at the Houston Society for Maintenance and Reliability Professionals Annual Conference. I thought you might enjoy the high points. If you would like the longer more statistic laden version let me know.

Due to the changing economy and American culture, we are seeing a long-term shortage of skilled trades. These skilled trades include the industrial Mechanics, Machinists, and Electricians who have kept American facilities producing the wealth to which we have become accustomed. This has been perpetuated by three compounding circumstances:
·         The common Baby Boomer myth that everyone’s kids have to go to university to be successful and the overall negative cultural bias toward blue collar employment
·         The movement from working on cars and hot rods in the garage with your father to playing video games or working on computers.
·         And most recently, the inevitable Baby Boomer retirement, which has waned due to the economy, but is still a concern in the long run

Below are six elements that you can implement right now that will help to address the skilled trades shortage in your facility. These elements include the use of:
·         Clear documented business processes to ensure a smooth transition, continuous improvement, and employee involvement (which is key to the new generation)
·         Training as both a knowledge and a morale booster (training assessment and implementation)
·         Condition sensing technology (Predictive Maintenance (PdM) Tools) in the realm of maintenance and how they can bring the “cool” back to the skill trades
·         Historical data in your Computerized Maintenance Management Systems (CMMS) or Enterprise Asset Management Systems (EAM)
·         Mentoring by the retiring generation who are now more open to both data capture and transition of knowledge as they prepare to leave the workforce (this has not been the case in the past)
·         Proactive Reliability (many of the new generation do not want extraordinary overtime and constant reactive firefighting in the facility and instead strive for the more predictable, less stressful world of a proactive reliability culture)

We are seeing many interesting situations and issues come together to create what some are calling the “skills crisis”, but it comes down to how we handle the transition using techniques like the ones above to mitigate the risk associated with the “crisis”. In order for this to be accomplished, facilities should take the time to consider what issues they have and what steps they need to take, and then create a project plan for how they are going to get there. It should contain the who, what, and when and should take into account all other initiatives that could be running in parallel and be using the same resources.  We can have the greatest ideas in the world, but if they are not executed in an effective manner, the results we require will escape us.

Friday, July 8, 2011

Best Practice is not Good Enough: Are You Moving Toward "Future Perfect"

I was listening to an individual talking about business and improvement this week and he said a few things that resonated with me. I thought I would share them today in the context of reliability improvement.
Improvement initiative should move all elements toward the perfect state not just better than the competition or better than your sister plant or even best practice. He acknowledged that it may be unattainable financially or otherwise but it should still be the point you strive for. Many organizations map their currents state process and then make only the smallest of tweaks to these processes calling them the future state (because they believe they are already pretty good ). In the ultra competitive business environment what they really needed was a step change in performance to remain in or regain the highly profitable top spot.  If the same organization were to compare against the perfect state they would realize more of the possibilities available to them leading to better overall results.
So that got me to thinking what are some of the “future perfect” states of reliability and what could we learn from them…
Zero Unplanned Downtime\Zero Breakdowns
All work is planned and schedule.
Through the use of Root Cause Analysis (RCA) and Reliability Centered Maintenance RCM) the site could move closer to this state of zero unplanned downtime by eliminating the reoccurring failures with a failure mode based maintenance strategy.
Zero Parts Inventory
In a fully planned and scheduled world parts can arrive in a just in time (JIT) manor. In this perfect state, parts arrive to the job site just minutes before they are needed. Now we have eliminated the caring cost of all the spare parts that would normally reside only on the books and in the store room.
A perfect balance of labor and required work backlog
Many facilities shoot for 10-12 percent maintenance overtime to ensure that they have enough people to get the volume of the work done without having too many idle hands when the backlog is light. What if we could eliminate the need for overtime because our plans were so accurate and our schedules were so exact that we were left with a balanced equation every time.
What others can you think of? Even though you can’t necessarily attain them all how does the “future perfect” change your perspective of what is possible?
Thanks Rich MacInnes for spurring the thoughts!