Reliability-centered maintenance (RCM) continues to show up prominently in technical literature as the future strategic direction in machinery maintenance. It should be. RCM is the right thing to do when it comes to optimizing the operational reliability of plant equipment. It is important for lubrication engineers, oil analysts and other tribology professionals to understand RCM and how oil analysis and lubrication management fit into the RCM picture.

RCM is the systematic process with which to optimize reliability and associated maintenance tactics with respect to operational requirements. Economic optimization of machine reliability relative to organizational goals is the primary objective of the RCM process. Simply stated, RCM helps us ensure that if we spend a dollar on improving reliability, we are getting the full dollar back, plus some acceptable return on the investment.

As Figure 1 illustrates, the law of diminishing marginal returns applies to the implementation of reliability improvement measures. Generally speaking, the first dollar invested in reliability improvement tends to yield a higher return-on-investment than any dollar subsequently invested. The objective is to reach the point of optimization at which the benefits of reliability, expressed as total operating costs, are maximized through cost reduction. RCM is a set of systematic engineering procedures for achieving and maintaining that objective.

RCM's roots trace back to the 1960s when it was advanced to improve the safety and reliability of commercial aircraft. Since then, it has begun to move into the industrial sector as a result of work conducted by several authors, most notably John Moubray (RCM II) and NASA's publication "Reliability-Centered Maintenance Guide for Facilities and Collateral Equipment". Going further back, however, RCM owes its origins to the development of the reliability engineering discipline. It was there that the fundamental analytical tools were created to estimate the reliability of electrical and mechanical components and systems. Simply stated, RCM is a component of the quality movement that is focused on improving the safety, reliability and productivity of the equipment upon which our society depends for transportation, power and energy, and goods and services.

Why RCM and Why Now? In North America, we simply are not building new factories. In an economy where prices are set globally, we must profitably produce products (polypropylene to Plymouth Vipers) with aging equipment operated and maintained by a workforce that is among the most expensive in the world. This means that manufacturing assets must deliver big. And, so too must the maintenance strategies, like RCM, to maximize profitability.

RCM guides the reliability investment with improvement measures and techniques including lubrication management and analysis such that the economic optimization is realized. NASA has identified specific guiding principles of RCM (see sidebar). But, essentially, the reliability engineer is tasked to answer the following questions:

  • What is the system or equipment asked to do?
  • What functional failures are likely to occur?
  • What are likely consequences of these functional failures?
  • What can be done to prevent these functional failures?

In the past, we attempted to achieve reliability with frequent rebuilds. The strategy was founded on the assumption that the failure rate of machines increased as the asset aged. While some items fail in this manner, most complex systems, e.g., those found in process and manufacturing plants, do not. In one study, 30 identical deep groove ball bearings were run to failure on a test stand under highly controlled conditions. The variation in failure times was so great that if one statistically estimated the appropriate replacement time at the 95% confidence level the machine would never be started! In the field, the variation in time-to-failure is even greater. Thus, we have learned that the time at which complex equipment should be rebuilt can not, in many cases, be effectively estimated.

More recently, we have employed vibration analysis, lubrication analysis, thermography, and other condition monitoring and predictive maintenance tools in an attempt to identify early stage failures so corrective action can be scheduled "on-condition". We have also applied proactive measures to monitor and control the root causes of degradation and failure such as lubricant contamination, wrong/degraded lubricant, misalignment, unbalance, etc. These measures that employ advanced maintenance techniques and technologies have proven very effective, but if over-applied, they can be expensive and counterproductive. Moreover, in some cases they simply don't provide the required improvement in reliability to get the job done. In these instances, system redesign or the employment of redundancy is required to achieve the goals of the organization.

The process by which a reliability strategy is selected according to RCM is very systematic and logical (see Figure 2). As the flow diagram suggests, assets are audited with respect to their role in overall system reliability and productivity. If acceptable, no changes are required. If unacceptable, then questions about the criticality of the asset define the need to identify the most efficient means of attaining the necessary reliability. If the asset is deemed non-critical, for example, it is simply run to failure then rebuilt or replaced. For mission critical systems, advanced maintenance techniques are typically the first choice because their use is relatively inexpensive compared to redesign and the employment of redundancy.

In some cases, redesign or employment of redundancies is required to meet the objectives of the organization. Redesign in the form of proactive measures to control (and monitor) lubricant contamination, alignment, balance, etc. is usually much less expensive than to deploy than failure detection strategies. Conversely, more involved system redesign is usually very expensive and often produces unpredictable results. The employment of redundant systems is the most expensive method to improve reliability, but it provides very sure results. Employment of RCM helps to avoid the casual application of the latest "panacea" strategy, avoiding mistakes that waste resources and provide mediocre and unpredictable performance.

Table 1 summarizes strategies for achieving reliability and the conditions under which they are selected in the RCM process. In today's competitive environment, organizations are looking to advanced maintenance strategies, especially condition-based maintenance, to provide the necessary reliability at minimum cost. The cost to rebuild or replace is quite high and yields dubious value. Purchasing and maintaining redundant systems is reserved for only the most critical systems where no other strategy provides satisfactory results. Advancing technology has brought condition-based maintenance to the forefront of the RCM movement. Lubrication management and oil analysis play an integral role in this movement.

The reliability engineer employs a number of analytical tools to optimize reliability relative to mission goals. Some of the more common tools include:

Reliability Statistics - Reliability statistics differ from conventional experimental statistics. They provide the means with which to estimate the likelihood that a system will achieve its mission given a stated duration and operating conditions. It is important to become knowledgeable about the methods of reliability engineering in advance of undertaking an RCM project.
Reliability Block-Diagrams - Once sub-system reliability is determined, the system can be effectively modeled from the reliability perspective. Once modeled, the weak links usually become evident and can be addressed with reliability growth measures to eliminate the deficiencies. Figure 3 illustrates block-diagrammed examples of simple serial, parallel and combination systems.
Failure Modes Effects and Criticality Analysis (FMECA) - FMECA is the inductive process of identifying primary functional failures, their related failure modes or states, the effect of the failure modes on the operation of the system and the associated criticality of the failure mode as a function of impact and likelihood. This valuable analytical tool enables the removal or better management of failure modes through application of advanced maintenance techniques, redesign or redundancy.
Root Cause Failure Analysis (RCFA) - RCFA assesses a failure after the fact with the intent to determine its root cause for occurrence. Once the root cause is ascertained, the engineer can assess the risk of recurrence, the success with which the root cause might be controlled and the cost to control it. With this information, a decision can be made to deploy control measures or to let it go.

RCM and the Oil Analysis Professional. After careful analysis, reliability optimization in a process or batch manufacturing plant usually includes a heavy dose of proactive and predictive maintenance. Typically, lubrication management is a top candidate for improvement in the quest to bolster mechanical system reliability. As such, the lubrication engineer or oil analysis technician will need to provide some technical precision in the following areas:

  • Lubricant Specific FMECA
  • Deployment of Proactive Lubrication Management Measures
  • Effective Utilization of Predictive Oil Analysis Techniques

Lubricant related failures are often lumped together somewhat casually under the term "inadequate lubrication". The lubrication engineer knows that inadequate lubrication can refer to insufficient quantity of oil, wrong oil, degraded oil, contaminated oil, additive depletion, poor specification or many other conditions. He must support the RCM process with a more detailed lubrication-related FMECA that properly represents the equipment, the operation, the environment, etc. Figure 4 identifies several of the lubrication related failure modes and the general questions for which the lubrication engineer should supply a FMECA information. Many other machine specific failure modes are revealed effectively by oil and wear debris analysis. That information must be included with technical precision into the overall FMECA process.

Proactive lubrication management offers an inexpensive way to reduce the inherent failure rate of mechanical systems. When the failure rate is reduced, reliability increases for all mission duration periods. Often, lubrication management can eliminate the need for more drastic and expensive measures. The lubrication engineer or oil analysis technician should coordinate with the reliability engineer to understand which systems require reliability growth. The process should culminate in the form of a list of changes to upgrade lube specifications, staff education and development, improve contamination control improvements, improve delivery mechanisms, enhance testing and inspection practices, educate and train staff, etc.

Oil analysis has proved to be a very effective method for scheduling on-condition oil changes. Perhaps more important is the effectiveness with which oil analysis can identify machine failures and support the process of identifying the root cause of the failure. Just as blood carries clues about the health of the human body, oil carries important information about the health of machinery. In some cases, oil analysis provides the very earliest warning of trouble. In other cases, it provides confirming information. And, occasionally, it carries no information at all about a failure. Just as the physician employs all the techniques and specialists available to detect and understand problems related to health, the machinery engineer must select the right mix of analysis techniques and technologies to make the very best decision.

The warning time in advance of a functional failure that a monitoring technique provides is called the P-F interval. P refers to the time at which a potential failure is detected, and F refers to the time at which the actual failure occurs (see Figure 5). Simply stated: the longer the P-F interval, the more time one has to make a good decision and plan actions. As a rule, better decisions and more planning time minimize the financial impact of the event on the organization. Table 2 summarizes a general assessment of the effectiveness of the primary condition monitoring tools (lube analysis, vibration analysis and thermal analysis) with respect to the detection P-F interval and root cause failure analysis. It is always important to factor in application and environmental conditions in such generalizations before finalizing technology selection and deployment selection decisions.

In conclusion, all reliability improving techniques including lubricant management and oil analysis, must harmonize and align with the organizational objective of optimized asset utilization and maximized profit. RCM is the heart and soul of this process. The lubrication engineer and oil analyst of the future will play a vital role in the RCM process and as a part of the reliability team achieving and maintaining optimized asset reliability.

Moubray, John(1997) RCM II - Reliability-Centered Maintenance, Second Edition, New York: Industrial Press, Inc.
National Aeronautics and Space Administration (1996) RCM Guide for Facilities and Collateral Equipment.