As head of Noria's failure investigation group, I've led many interesting studies in search of failure root causes and remedies. These include missile system failures, highway accidents, helicopter crashes and turbine-generator wrecks. Most of these nearly 100 investigations were substantially hampered by errors made in collecting and preserving evidence.

We know that when critical failures occur, every effort should be made to prevent repeat performances. Yet, without an intervention to remove the underlying root cause, a recurrence is almost guaranteed. It stands to reason that maintenance organizations should consider failure investigations as seriously as they do the repair activities needed to return a machine to service. Yet all too often, once production has been restored, the urgency and memory of the failure begins to fade.

We've published extensively on the importance of root cause analysis (RCA) and the steps needed to carry out an RCA. This column will not address these well-documented procedures but instead focuses on the equally important task of preserving and collecting evidence. After all, it is this evidence that serves as the essential raw material used in the RCA processes. The quality and completeness of this evidence (raw material) is arguably the central factor that determines the precision of the delivered result (the root cause and RCA end product).

Sadly, by the time I get a phone call to participate in a RCA, there is usually only a scintilla of evidence remaining. Perhaps there are a few fragments of a broken bearing or the shelled-out remains of a failed pump. In other cases, there might be photos of the crime scene taken by an alert technician. Of course, there is plenty of anecdotal evidence and personal theories from people who arrived first on the scene. But when it comes to collecting quality data and preserving physical evidence, what's available is usually pretty skimpy. My advice on preserving evidence is as follows:

What to Do When an Impending Failure is Suspected
Preserving the crime scene should begin with the first indication of a problem. Don't wait until you have a dead body to take action. Most data can be captured only when the machine is still running, so don't squander the opportunity. Create a log that will serve as a veritable timeline of all information that could be useful in the event the situation worsens. Some suggestions include:

  1. Save used oil filters as evidence in a well-marked sealed plastic bag. Note filter condition and abnormal collections (glitter, sludge, debris color, etc.). Record filter service life, especially if it's shorter than usual.

  2. Increase the frequency of oil sampling. Pull primary, secondary and BS&W samples. Some samples can be held in reserve and used only if needed by an RCA investigation.

  3. Begin looking for structural and mechanical faults such as misalignment, soft foot, resonance, looseness, bent shaft, etc.

  4. Step up vibration analysis by looking for faults and incipient conditions revealed by shaft speed, gear mesh frequency, vane/blade pass frequency, roller/ball pass frequency, cage frequency, etc.

  5. Perform frequent walk-down inspections. Check out sight glasses (level, foam, color emulsions, etc.), shaft wobble, oil color and clarity, BS&W bowls, sludge and varnish, pressure abnormalities, leaks, magnetic plugs, etc.

  6. Temperature is a good indicator of impending failure. Use heat guns, infrared cameras and dedicated temperature probes to look for thermal excursions.

  7. Machines emit an assortment of audible signals; some are normal but others are not. Report unusual whines, rattles, rumbles, pops, etc. Use acoustic instrumentation to isolate structure-bone ultrasonic emissions and other unusual sonic emission sources. Alternatively, a rod, garden hose or stethoscope could be helpful in localizing problems.

What to Do During or Immediately after Failure
At some point, the rapidly approaching end of a machine's service life will be vividly apparent. In other cases, the machine may fail precipitously with little to no warning. These are known as sudden-death failures. Regardless, it is time to begin preserving evidence by securing the crime scene.

Alert operators and maintenance staff that the failure has occurred (or is about to occur) and that protection of evidence is critically important. Assuming the above-listed pre-failure data and evidence collection was already in full-stride, the following should be performed during machine failure or just after:

  1. Document all final readings while the machine is still running: temperatures, vibe, shaft displacement, pressure, speeds, loads, flow rates, etc.

  2. Look for signs of intrusion, botched repairs, sabotage, exterior damage, operator abuse, etc.

  3. Photograph any conditions that would be relevant including sight glasses, tank conditions, leaks, etc.

  4. Begin interviewing witnesses. Find out what they observed, heard, smelled, and anything else that would be relevant. Document the interviews.

  5. Contact your failure investigator for advice on the next steps.

What to Do During Repair and Teardown
The most important evidence is typically lost, destroyed, altered or simply mishandled during machine repair. While prompt return of a machine to service is important, don't be foolish when it comes to data collection and evidence preservation. Consider the following:

  1. Throw nothing away, especially during repairs, teardown and final inspections. Keep oil filters, used oil/grease, used coolant, BS&W, damaged parts, sludge, breathers, magnetic plugs, etc. Bag or package all preserved parts as evidence and label correctly.

  2. Don't clean anything. Save this for the failure investigator.

  3. Obtain samples of the new oil that was used for top-ups or oil changes. Save a sample of new grease if grease-lubricated.

  4. Photograph the tear-down and repair process. Use both video and digital photography. Many parts may need to be photographed with additional lighting and with macro lenses.

Don't force your failure investigator to wildly speculate a root cause because the investigation was crippled by inadequate and poorly preserved evidence. Conduct FSI workshops for maintenance staff, engineering and operators, aided by expert investigators.

Establish policies and guidelines that clearly state how to conduct RCAs and preserve evidence. And remember this: It's hard to determine the exact cause of death when all that remains of the body is a finger.