In their search for the latest and greatest management and technical innovations, which often arrive with glorious trappings and fanfare, organizations often overlook opportunities to correct deficiencies that are so ingrained within the organization that they are often accepted as normal. Even simple changes can profoundly affect an organization’s effectiveness and bottom line. Clearly, the organization can’t ignore highly innovative technology and business practices that might profoundly affect its performance. However, management must place priority on eliminating controllable deficiencies that the organization accepts as normal, and it must support this effort with resources.

The problem-solving method with which the roots of these deficiencies are uncovered, described and eliminated is generally referred to as Root Cause Analysis (RCA), or Root Cause Failure Analysis (RCFA). This powerful technique can, and should be employed by machinery lubrication professionals to seek out and eliminate the underlying reasons for technical problems and/or human errors.

Regrettably, most organizations that depend upon heavy equipment to achieve their mission are riddled with lubrication problems. In some cases, managers incorrectly attribute lubrication-induced machine failures to another cause or vice versa. Human nature is to fit a current failure event into a familiar framework that the individual has seen before, often with only surface knowledge about the specific event. In other cases, the organization has simply accepted poor machine or lubricant life as normal. In either case, removing the deficiencies can translate into significant bottom-line improvements - often with little capital or manpower investment.

Figure 1. Root Cause Analysis Process

Sometimes lubricant failure is detected in time to avoid damage to the host system. In other cases, the host system fails because the lubricant or lubrication system fails to perform effectively. In still other cases, the lubricant and/or lubrication system offers valuable clues about a failure that was caused by some other forcing function. In all cases, the lubrication professional should be well-grounded in the RCA process and its supplementary tools and methods to recognize the fundamental reason for the problem, so that he can implement a root cause solution.

Root Cause Analysis Process
Several methodologies have been advanced for performing RCA, or RCFA. While the fine details of the various approaches differ, most of them share the most important elements in common. One excellent (and free) resource is DOE-NE-STD-1004-92, the DOE Guideline for performing RCA. It was developed in 1992 by the U.S. Department of Energy, Office of Nuclear Energy and can be downloaded in PDF (Portable Document Format) format from the Internet. R. Keith Mobley’s book “Failure Root Cause Analysis” is also an excellent resource. Mobley wrote the book for plant engineers, and he utilizes terms and language familiar to lubrication professionals and reliability engineers.

RCA is detective work at its finest. It requires the collection and analysis of evidence for the purpose of solving the crime. It employs deductive reasoning much like detective Sherlock Holmes might have used. To be effective, RCA must be carried out in a systematic manner. Following is a general description of the five-phase process described in the DOE Guideline for performing RCA (Figure 1).

This process is intended to enable the lubrication professional to more ably conduct lubricant failure investigations, or to support machine failure investigations where the lubricant and/or lubrication system is a suspect, or where it might offer valuable clues.

Phase I - Data Collection
Perhaps the most important step in the RCA process is to preserve the crime scene and collect clues about the failure. Unless it compromises safety or the quality of information, begin to collect data immediately after the failure occurs - or while the failure is in progress if possible. Make every effort to preserve all data and physical evidence of the failure. The lubricant and lubrication system offer a tremendous amount of information about the failure. This information is often overlooked or unknowingly discarded. Be sure to collect this vital data and physical information, which includes, but is not limited to the following items:

Lubricants - Often, mechanical failure is in one way or another tied to the lubricant. Collect the lubricant for further evaluation: the lubricant is an unquestionable repository of valuable clues. It can reveal the presence of contaminants or degradation byproducts that may be analyzed. The lubricant may also contain wear particles that can be analyzed to reveal clues about the physical activity that occurred on the component’s surface leading up to the failure event. Remember, a wear particle is the mirror image of a defect left on a component’s surface when wear occurs. Analysis of the particle’s size, shape, color, surface detail and other qualitative information reveals information about the wear mechanism leading up to the failure, and often the root cause.

Filters and Separators - Like the oil, the filter collects debris leading up to and including the catastrophic event. However, unlike the lubricant, which is a snapshot of the system at the time the sample is taken, the filter can serve as a history book of the events since the filter was installed, up until the failure event. Similarly, the effluent from centrifugal separators might also contain clues about the failure. Often, excessive particle contamination causes the wear leading to failure, or causes valves to jam or stick. Inspect the filter, along with its manifold or housing to identify common filter failures, including: media failure or rips, seam failure, end-cap failure, bypass valve failure, etc. Also, inspect vacuum dehydration units, centrifugal separators, coalescing separators, electrostatic separators, ion exchange resins, fuller’s earth, etc., to ensure they are functioning properly.

Gums, Resins Varnish, Sludge and Other Deposits - If the lubricant was badly degraded or it reacted chemically with coatings, seals, environmental contaminants, process contaminants, etc., the clues about the reaction will often reside within the gum, resin, varnish, sludge or other deposits that are the byproduct of the reaction. Grease lines and bearing housings occasionally contain grease thickener that has separated, a condition that should be noted. Also sample the caked material for analysis. Occasionally, simple tests such as elemental analysis can be performed on this residue to identify their likely sources such as degraded additives, grease matrix materials or ingressed contaminants.

Oil Analysis - Include all previous oil analysis when collecting condition- monitoring information. Also, be sure to call the oil analysis laboratory if it held-over the last sample. It is common for labs to hold the samples for some period of time after performing the initial analysis in case a problem arises and it needs to retest the sample. You may want to run some nonroutine tests on this held-over oil to uncover any evidence of the incipient failure contained in the sample. It is important to call the lab as soon as possible; samples are held over only for a short time.

Tank or Sump Condition - Inspect the sump or tank for rust, varnish, water puddles, missing, damaged or saturated breather, open hatch covers, damaged seals and gaskets, leaks, etc. Be sure to collect information about the lubricant level, including level sensors and low level alarms.

Lubrication System - Evaluate the lubrication system, including the following components:

  • Pumps for functionality, including flow rate;
  • Hoses and piping, including flanges and connectors for obstructions and leaks;
  • Automatic lubrication systems, which can fail or become clogged;
  • Slinger rings and collar lubricators for functionality;
  • Constant level oilers for proper level and for flow obstructions;
  • Mist systems, including reservoir, nebulizer, piping, reclassifiers;
  • Spray lubricators for spray pattern, spray direction and flow volume; and
  • Drip and wick lubricators for flow settings and obstructions.

System Process Parameters and Observations - Assemble previously collected process parameter information including lubricant temperature, flow rate, percent saturation with water, online particle count and any other measured information. For engines, the presence of black or white smoke and other operational abnormalities should be noted. In hydraulic systems, cycle time and cylinder creep should also be noted.

Whenever a component is removed or replaced, it should be carefully scrutinized. It is particularly important to inspect components before and after cleaning because residues found on uncleaned bearings, gear, etc., may offer vital clues to the real root cause of the failure.

Wherever possible, take photographs, and when appropriate, video to supplement the collection of physical evidence. Today’s digital cameras allow the inspector to shoot hundreds of high-resolution photographs without uploading to a computer. Likewise, if parts, wear particles or other physical evidence are analyzed microscopically, capture images where possible. As the old adage states: A picture is worth a thousand words. This is important during a failure investigation when one is attempting to collect physical failure evidence before it becomes spoiled.

In addition to collecting data and physical evidence, it is necessary to interview operators, mechanics, engineers, supervisors and managers to collect the softer information. According to the DOE standard, “interviews must be fact-finding and not fault-finding.” Interview those people most familiar with the machine and the situation; often this means operators and lube technicians who work with the equipment daily. Include a walk-through if possible. Just like Peter Falk as Lt. Colombo in the 1980s TV detective show of the same name, being near the scene of the crime often serves to jog the memory and bring useful information to the conscious mind. It is important to conduct interviews as quickly as possible. Over time, if the incident doesn’t match an individual’s idea of what should have happened, the mind tends to distort one’s perception of the facts to better fit his or her paradigm.

Prepare a list of questions before the interview to guide data collection, but leave room for free-form thoughts and discussion. Generally, the questionnaire should uncover information about the items described in Table 1.
Click Here to See Table 1.

The items listed are by no means an exhaustive list of all possible lubrication and lubrication system data that might be collected during or after a failure, but are generally applicable to the lubricated machines found in most plants. Mobley suggests that the lack of a formal process to facilitate the reporting and data collection of a failure event severely limits the effectiveness of RCA. It is sensible to create such a form specific to data about lubricant and lubrication system that should or could be collected. Again, refer to the items listed in Table 1 for guidance.

Phase II - Assessment
The assessment phase involves the analysis of all collected data in order to identify causal factors. The major failure cause categories are detailed in Table 1 along with some lubricant or lubrication system related examples. According to DOE-NE-STD-1004-92, the objective of the assessment phase is to identify the problem and its significance, then to work progressively through the possible causes until the fundamental root cause(s) is (are) defined at the highest level of resolution. This means that the event cannot reasonably be reduced any further (Figure 2).

Figure 2. Failure Assessment Process Flow Model

Consider the simplified example of a pump failure in Figure 3. (Click Here to See Figure 3.) After determining that the event is significant, one begins by identifying possible causes at the first level in the sequence, eliminating nonapplicable causes leaving only the cause or causes applicable to the situation. In our example, the pump failure might have been attributed to the pump, the motor or the coupling; in this case it turned out to be the pump.

The process then sequences to the next causal level, which in our case is to determine if it was a bearing or impeller failure; in this case it’s the bearing(s).

The next sequential level is to determine if the failure was caused by misalignment, contamination, degraded lubricant, etc. In our example, based on oil analysis data, the bearing failed due to dirt contamination-induced wear. Dirt can enter or remain in the lubricant because the new oil is dirty, the breather is missing or ineffective, the seals are failing to exclude them or the filter is not effectively removing them. In our case, it is a filter failure, this is where our system evaluation and interview process comes into play. A filter can fail to perform because it is damaged or because it is full.

In our example based on evidence provided by the lube technicians, a full filter was the culprit. A full filter can be attributed to a lack of or ineffective inspection, an incomplete work order to change the filter, a failed pressure differential gauge or the lack of a pressure differential gauge. In our example, again based on a system inspection, a failed differential gauge was the root cause of the pump failure. In this case, the corrective action should be to replace the gauge and to implement routine inspection to test the gauges in the future.

Numerous methodologies are available for completing this cause-effect analysis once a failure has been identified and deemed significant and worthy of further investigation. The methods vary in their sophistication, but they all focus on establishing clear cause-effect relationships. Below is a general description of some of the more common techniques:

Fault Tree Analysis - Arguably the most popular and sophisticated technique for analyzing failures, fault tree analysis (FTA) is a deductive reasoning technique that may be employed before the failure as a design tool (usually in conjunction with or in lieu of failure modes effects analysis or FMEA), or after the fact as a failure analysis tool. According to the international standard IEC 1025, “FTA is concerned with the identification and analysis of conditions and factors which cause or contribute to the occurrence of a defined undesirable event, usually one which significantly affects system performance, economy, safety or other required characteristics.” FTA results in a tree that starts with a top event, and progresses logically downward until the limit of resolution is reached, which reveals the root cause or causes. This is the approach used in the example in Figure 3.

Cause-and-Effect Analysis - This technique is widely called fishbone analysis due to the fish-shaped pattern that it produces. The typical cause-and-effect analysis identifies human and mechanical factors, methods and materials that might have resulted in the effect or were undesirable. The cause-and-effect analysis technique has been criticized for lacking a clear description of the sequence of events.

Other techniques described in the literature include: sequence of events analysis, events and causal factor analysis, change analysis, barrier analysis, management oversight and risk tree analysis, human performance evaluation and the Kepner-Tregoe problem- solving and decision-making method.

In addition to implementing processes and procedures for in-house staff to assess failure events, it may be necessary to engage additional expertise in the process. In some cases, especially for complex failure investigations, it is advisable to seek out individuals who have a deep understanding of the failure investigation process itself, and who can guide you in avoiding mistakes. Likewise, it may be necessary to engage individuals knowledgeable of the particular failure type and/or root cause(s) - again, to help shorten the process and avoid possible errors and/or omissions of fact and/or process steps due to a lack of in-house detailed knowledge.

The lubrication professional’s failure investigations will likely fall into one of the following three distinct scenarios:

  1. Those where the machine itself has failed and the lubricant or lubrication system is suspected as either the root cause, or at least a contributing cause;
  2. Those where the machine has failed due to some other cause, and the lubricant and lubrication system offer clues about the nature of the failure; and
  3. Those where the lubricant has failed or is failing, but the machine has not yet failed but it is at-risk.

The first two categories are reactive in nature. Functional failure of the machine has occurred, and the investigation is focused upon gaining an understanding of the event to make corrective actions and avoid its recurrence. The third category, however, is proactive in nature. By detecting a lubricant or lubrication system failure before a functional failure of the lubricated system, one may take preemptive action to eliminate the defect in advance of damage to, or failure of the machine.

The term lubrication failure is widely abused in industry. It is generally applied to any failure in which the lubricant is suspected. In some cases, it is assigned as a matter of convenience simply because no other cause was readily revealed. Ineffective lubrication often lies at the root of mechanical wear and failure, but one must develop a clearer understanding of lubrication failures and investigate them individually. There is no single definition for lubrication failure, rather multiple possible failures with multiple possible causes. Evaluate each significant failure independently of previous failures, avoiding the temptation to casually apply the scenario from a previous failure to the current one.

Common lubrication-related failure modes are described here to provide the lubrication professional with an understanding of the range of breadth of common problems. Lubricant or lubrication system failures might be attributed to material problems, procedural deficiencies, personnel errors, design flaws, training deficiencies, management problems or external phenomena, depending upon the nature of the event.

Click Here to See Table 2

Table 2 maps common lubricant failures to the various problem areas presented in Table 1.

Insufficient Lubricant Volume - This is a broad category. The condition can be proactively detected, analyzed and corrected before the machine fails, or afterward, depending upon how early the failure is detected. Described below is a partial list of possible scenarios.

  1. Sudden volumetric loss of the lubricant - It is possible for the machine to rapidly leak lubricant, or have the lubricant rapidly and accidentally pumped out of the system although this is uncommon. For example, a sudden hose or piping rupture could be attributed to material problems if the hose or piping fatigued or to design problems if the hose or pipe material was not properly selected for the application. Likewise, the problem might be attributable to inadequate procedures if clear, easy-to-follow, instructions don’t exist. There are dozens of possible scenarios that might explain sudden loss of lubricant.
  2. Low levels - This scenario is very common. For instance, if the machine is designed and/or installed in such a way that the lube tech is unable to access particular lubrication points for regreasing or topping-up without some disassembly, the job probably won’t get done without a shutdown, which may put the lube tech at odds with production supervisors. Conversely, the lack of a detailed route and procedure, along with appropriate training to carry out the required tasks can lie at the root of low lubricant levels. Again, dozens of possible scenarios might explain why lube levels are low.
  3. Insufficient delivery volume - While the tank or sump may be filled to the appropriate level, the lubricated components may not receive a sufficient volume of lubricant. For example, the pump may have failed or may be failing, spray nozzles might be clogged, the slinger ring might be damaged, etc. The lubrication system design may simply be wrong for the machine or the application, or the flow rate might be turned down to dangerously low levels in large lubricating systems to reduce leakage. This could be the result of a procedural error if the required volume flow is not clearly defined, a personnel error if the procedure was not carried out properly or a training error if the individual used poor judgment in deciding to reduce leakage at the risk of the lubricated components. One might also attribute this failure to design flaws or material problems that caused the excessive leakage in the first place. There are a multitude of possible scenarios in this category.

Excessive Lubricant Volume - Excessive lubrication is common in machines, particularly greased bearings.

  1. Manually applied - Manually applying too much lubricant is a common problem, particularly in greased bearings. Overgreasing bearings results in exterior seal damage, intrusion into the motor where the grease causes deterioration of the motor’s insulation, churning-induced heat and, in some cases, rapid bearing failure in the event that highly pressurized grease pushes the bearing’s shields into the rolling elements. Common reasons for manually applying too much lubricant include: no procedure, procedures which lack accurate detail about relubrication volume and interval, problems where the grease gun delivery volume is unknown or varies from gun to gun, lube techs who aren’t properly trained, insufficient work planning systems, etc.
  2. Automatically applied - This is a less common problem, but not at all unusual. For example, if the automated lubrication system does not include a relay switch to cut off lubrication when the machine shuts down then reengage when the machine starts up again, then components lubricated by the automatic system are susceptible to overlubrication. Or if a circulating oil-lubricated machine delivers oil at too high a flow rate given the oil’s viscosity and operating temperature, and the machine’s geometries and speed, whipping might be induced, causing the shaft to become eccentric within the bearing.

Wrong Lubricant - Again, a common problem that might be attributable to material problems, procedural deficiencies, human error, design flaws, training deficiencies, management problems or external phenomena, depending upon the nature of the event. Described below is a partial list of possible scenarios.

  1. Wrong lubricant added to machine - A common scenario. For example, the wrong lubricant might be used during an oil change or a top-up because the lubricant transfer containers are improperly marked, the machine is improperly marked, the work order and/or procedure did not contain sufficient detail, and/or the lube tech failed to follow directions properly. Also, transferring the lubricant through common hoses and hardware can lead to the inadvertent addition of the wrong, and sometimes incompatible, oil. This last example might also be classified as failure to exclude contaminants.
  2. Lubricant improperly labeled - While uncommon, new lubricants are occasionally mislabeled when they arrive at the facility from the supplier. This scenario is usually attributable to the manufacturer or distributor during the packaging or delivery process. However, if the organization has internal systems in place to guard against such a mishap, then the failure can be attributable to procedural deficiencies, human error, training deficiencies or management problems.
  3. Wrong lubricant used during initial fill - While not prevalent, this scenario is far from unheard of. This problem can occur with either new or rebuilt machines. As an example, most facilities outsource motor rebuilds. In the absence of a clearly defined requirement, most motor rebuild shops will fill the bearings with polyurea grease, typically an excellent choice for motors. However, the plant may use a multipurpose lithium complex grease to relubricate the motors bearings. Because polyurea and lithium complex greases aren’t compatible; or for that matter, not all polyurea thickeners are compatible, the grease starts to soften considerably and ooze out of the bearings and onto the floor or into the motors windings upon mixing.
  4. Incorrect lubricant specification - Frequently, the specified lubricant lacks the performance properties required to meet the demands of the application. This is particularly prevalent where generic lube specifications, perhaps from an OEM, are used without carefully considering all factors, including operational environment factors. Likewise, lubricants often possess properties that are not required, and in some cases, are hazardous themselves. Where the properties are lacking, it might be necessary to produce a detailed list of performance requirements given the challenges of the environment and application, and given the organization’s desire for extended drains, etc. Once this detailed list is compiled, it may be necessary to conduct detailed testing to identify the best available candidate. Conversely, machines often use expensive lubricants that possess special properties that go to waste because it isn’t warranted by the application. Or, the lubricant possesses properties that can cause problems themselves, such as a strong EP in a worm gearbox with a bronze ring gear. The bronze material is susceptible to corrosive attack by the aggressive additive.

Contaminated Lubricant - This is perhaps the most common cause of machine wear and failure. Contaminants may temporarily affect the performance of the lubricant, or catalyze chemical reactions that materially change the lubricant’s physical, chemical and performance properties. Described below is a partial list of possible scenarios.

  1. Ineffective or insufficient exclusion - Contaminants, in their various forms, can enter the machine in numerous ways. Often, contaminant ingestion is highly controllable with minor modifications to the system, such as upgraded seals or breathers, or by developing and implementing appropriate procedures, including properly training staff. For example, poor vent breather performance and short element life might be a direct result of the breather’s location, where it is susceptible to large amounts of dirt or direct water spray during equipment wash-downs. Simply adding a pipe or hose to manifold the element to clean air and out of the direct path of water spray, may increase the element’s performance level and life considerably.
  2. Ineffective or insufficient removal - Filtration and contaminant separation systems often fail or do not produce the desired results. Filters are often improperly located to achieve optimum contamination control. Or, they may be sized too small, where the initial DP, or differential pressure, is too high to have any hope of long element life. Likewise, the filtration systems are often offline, and have their own pumps, motors, valves, hoses and connections that can fail in one way or another. The machine may be set-up with insufficient filtration, or could require a supplementary separation technology to remove water, acid, sludge, etc., from the oil.
  3. Insufficient resistance - Occasionally, a situation arises where the lubricant reacts or coexists poorly with contaminants that can’t be completely controlled in the machine. For instance, natural gas compressors must effectively separate the natural gas from the lubricant in the after cooler. The hydrocarbon-based natural gas is soluble in mineral oil, synthesized hydrocarbon oils and many ester-based fluids. However, it does not dissolve readily into polyglycol-based lubricants, a good choice for this application. Likewise, water hydrolyzes ester-based fluids, creating an acidic environment. Susceptible ester-based fluids, such as diesters and phosphate ester should be avoided where water cannot be eliminated, unless of course, safety dictates its use for fire resistance in the case of a phosphate ester.

Lubricant Failure - Lubricants don’t last forever. They eventually wear out and must be changed or reclaimed. However, if the rate of lubricant degradation is shorter than normal, the lubricant might have been defective when new, or a new forcing function has increased the rate of degradation.

  1. Base oil failure - Base oils usually degrade due to oxidation. High temperatures can accelerate the process, or even degrade the base oil in an oxygen-free environment. Likewise, exposure to incompatible chemicals can produce undesirable effects on the lubricant. Sometimes, the degree to which base oil degradation occurs becomes intolerable because management seeks to extend drain intervals. In other cases, operational or environmental factors increase the stress on the lubricant. It becomes necessary in both cases to upgrade the lubricant. Begin by detailing the required performance levels relative to the new objectives for the lubricant and/or new stresses to which the lubricant is exposed. It may be necessary to perform specialized testing on candidate oils to determine which one will perform best given the requirements and application-related stresses. In the event that an incompatible chemical produces an undesirable reaction, it will be necessary to do a thorough investigation of the system and the environment relative to the fluid to determine what reaction might have occurred. In this situation it is imperative to collect samples of the reaction byproduct (that is, sludge, scabs, deposits, etc.), if any exist. The secret about the reaction can most likely be found in the result. An investigation into a lubricant base oil failure, either to detect an external problem or to select a more appropriate product for the application, often requires a detailed and specialized test slate and a robust experimental design.
  2. Additive failure - Like base oil degradation/depletion, a problem might arise due to changes in demand or due to a new forcing function. Today, oil drains are being extended to new lengths, especially in large rotating equipment. This often takes the lubricant to new and untested lengths of service. Likewise, lubricant formulations are frequently changed to meet the needs of new base oil types, etc. In some cases, the new formulation is not fully compatible with the legacy oil, and it takes some in-service time for this problem to surface. When lubricants are stored incorrectly, the additives can degrade, or condense out of the oil. Regardless of the nature of the problem, additive deficiency is serious. Additives provide or enhance many of the lubricant’s performance properties. It may be necessary to conduct detailed and out-of-the-ordinary lubricant testing to reveal an additive problem and to develop a suitable solution. Just like base oil degradation, an investigation into a lubricant additive failure, either to detect an external problem or to select a more appropriate product for the application, often requires the selection of a detailed and specialized test slate and a robust experimental design.
  3. Grease thickener failure - Each grease thickener has advantages and disadvantages. Some resist thermal stress, others resist water washout, still others don’t shear down easily, etc. For example, if grease remains for long periods of time in long distribution lines that run adjacent to a hot machine, the lubricant may separate from the thickener and cake-up the line, or the lubricant may actually reach its dropping point (the temperature at which grease permanently transitions from a semi-solid to a fluid). In this case, the line could be rerouted to correct the problem, or the grease might need to be respecified to deal with the environmental stress. Again, an investigation into a grease thickener failure, either to detect an external problem or to select a more appropriate product for the application may require the selection of detailed and specialized test slate and a robust experimental design.

Abnormal Wear Debris Generation - Wear debris analysis is an assessment of the machine and may be employed to help evaluate machine failures whether or not the lubricant is suspected as a root cause. The technique’s power lies in the fact that a wear particle that is generated by a failed or failing machine is the mirror image of the component surface that generated the particle. The debris can be extracted from the fluid, the filter or the effluent from centrifugal separators to define its metallurgy and analyze numerous qualitative aspects of the particle’s appearance. Common wear debris analysis tests include atomic emission spectroscopy, ferrous density analysis and optical microscopy. However, during failure investigations, it is often advisable to employ specialized X-ray fluorescent spectroscopy, scanning electron microscopy, X-ray crystallography and other detailed metallurgical tests.

Phase III - Corrective Actions
To effectively improve reliability and safety, the root cause analysis process must produce corrective actions. These corrective actions should be geared toward preventing recurrence of the problem yet be feasible to implement and stay within the organization’s mission, without introducing new risks that are deemed unacceptable. Prior to taking corrective action, the organization should consider the consequences of implementing the actions versus the consequences of not implementing them. In addition, the organization should consider capital costs, engineering costs, training costs, operational costs, risk-based costs and other costs relative to the benefits associated with eliminating recurrence of the failure multiplied by the probability that the corrective actions will in fact prove effective.

Phase IV - Inform
It is necessary to inform all parties of the correction, particularly changes that will affect them. This includes management, supervisors, engineering, operations and maintenance personnel, as well as affected suppliers, consultants and subcontractors. It is also appropriate to notify other locations within the company of the findings and recommendations from the RCA process so they may evaluate the information relative to their unique situation and implement the corrective actions where applicable. If the failure is safety or environment-critical, other organizations within and outside the industry should be notified of the findings. Yes, even competitors should be notified - it is good corporate citizenry to do so, and the only ethical scenario.

Phase V - Follow-up
It is necessary to follow up to ensure that the corrective actions were properly implemented, are functioning as intended and have in fact eliminated the problem. Should the problem recur, reevaluate the original occurrence to determine why corrective actions were not effective. Identify and analyze the difference between the first and second events to see what has changed and make any adjustments required to revise the original corrective actions, or add additional corrections to address problems caused by one of the differing variables.

Lubricants and the lubrication system can fail in many different ways. These failures might be attributable to problems with material, procedures, personnel, design, training deficiencies, management and/or external phen-omena. Likewise, when a machine fails for other reasons, the lubricant or lubrication system offers clues about the failure and the events that led up to it. In many cases, the lubricant offers the only reasonable way to assess what happened to the component’s surface prior to failure, because the component itself is usually mangled and compromised during the late stages of the failure.

Most organizations that rely upon heavy equipment to achieve their mission will have a difficult time matching the value proposition offered by excellence in precision lubrication. Lubricant and lubrication system failures are typically the chronic, recurring type. Fortunately, they can usually be uncovered, analyzed and corrected with minimal investment of resources, making them a perfect target for performance improvement initiatives.

Lubrication professionals should seek an understanding of root cause analysis processes. The methods are generally simple and intuitive. Standards, books and articles from experts on the topic are widely available, some of which can be found for free on the Internet. Learn these skills and integrate them with your knowledge of lubricants and lubrication systems to become your organization’s top-shelf failure and opportunity detective.


  1. Brousard, S. (1994, November). Becoming an Equipment Reliability Detective: Preserve Failure Data. Reliability magazine.
  2. Department of Energy. (1992). "Root Cause Analysis Guidance Document (DOE-NE-STD-1004-92)". Washington, D.C.: U.S. Government Printing Office.
  3. International Electrochemical Commission (1990). "Fault Tree Analysis (IEC 1025)". Geneva, Switzerland.
  4. Mobley, R.K. (1999). "Root Cause Failure Analysis". Woburn, Mass.: Butterworth-Heinemann.
  5. Palady, P. & Olyai, N. (2002, August). The Status Quo’s Failure in Problem Solving. Quality Progress, pp. 34-39.
  6. Troyer, D. (2000, May-June). How to Lube Up Your FMEA Process. Practicing Oil Analysis, pp. 43-52.