Reliability Basics for Condition-Monitoring Professionals

Noria Corporation
Tags: oil analysis

As mechanical demand continues to surpass the designed output of our equipment, all departments outside of maintenance are becoming aware of how reliable our equipment actually is, or more precisely, how reliable our equipment isn’t.

As condition-monitoring professionals, we need to be able to speak the language of reliability basics as fluently as possible. We need to be able to use reliability information as it relates to oil analysis and other condition-monitoring technologies to make key maintenance recommendations. The data collected for calculating reliability in conjunction with vibration analysis, oil analysis and other data can help us arrive at conclusions based on statistics.

In fact, if an asset has a clearly defined reliability pattern, this may help indicate the type of maintenance strategy best suited to an application. But before utilizing the vast amounts of reliability data, we need to know what it means.

Reliability Basics

Per military standard MIL-STD-721C, reliability is defined as “the duration or probability that an item can perform its intended function for a specified interval under stated conditions.” The reliability of an asset is typically expressed in hours either by mean time between failures (MTBF) or by mean time to failure (MTTF). MTBF is used as a measure of system reliability for repairable items.

Conversely, assets that are nonrepairable use MTTF as their basic measure. The goal of these and other reliability related metrics is to calculate the amount of time an asset can be expected to be reliable. These metrics are often used to plan maintenance ahead of expected or anticipated failures. This column focuses on MTBF.

Mean Time Between Failure and Failure Rate (λ)

MTBF is “a basic reliability measure for repairable assets and is expressed by dividing the number of hours in the observation interval by the number of random failures.” This simple calculation, as with most reliability-based data calculations, requires an observation period long enough to ensure that an accurate representation of typical productive intervals has been secured.

For an example of MTBF and failure rate, a typical mechanical system had five random failures occur in an observation interval of 1,000 productive hours. Simply stated, the MTBF is the average time between failures. In this case, the MTBF is 200 hours. The failure rate can now be calculated to provide the number of failures that occur on this equipment per hour. The failure rate, or the number of failures divided by the MTBF, is calculated by:

Failure Rate (λ)	= 1/MTBF
	= 1/200 hours = 0.005
	= 0.005 failure per hour

Mean Time to Repair and Repair Rate (µ)

MTTR is “the sum of all corrective maintenance divided by the total number of failures during the observation interval,” which supplies the average amount of time required to repair the equipment upon failure. Using the same example from MTBF, we know that the observation interval was 1,000 hours and there were five random failures. Each failure required various repair times and the sum of the repairs was 50 hours. Therefore, the average time to repair or MTTR is 10 hours per repair. The repair rate is calculated by:

Repair Rate (µ)	= 1/MTTR
	= 1/10 hours
	= 0.1 repair per hour

Once the MTBF and MTTR of a particular asset are calculated, these figures can be used to calculate the availability of a particular asset. Availability is defined as “the probability that an asset will be operable when needed.” Availability is calculated by dividing the mean time between failure (MTBF) by the sum of the mean time between failure (MTBF) and the mean time to repair (MTTR). From the previous examples, we know that the MTBF is 200 hours and the MTTR is 10 hours.

Therefore,

Availability	= MTBF/(MTBF+MTTR)
	= 200/(200+10)
	= 200/210
	= 0.95 or 95%

<,br>

Our task now is to calculate the reliability of the selected equipment over a given period of time. Reliability calculations include failure rate (λ) and time (t). For the following example, we will use the failure rate (λ) of 0.005 from the examples above, and for time we will assume 20 hours.

Reliability _(t)	= e ^–(λt)
	= e ^{–(0.005 x 20)}
	= 0.90
Note: e = Base of natural logarithms (2.718...)

From this information, it can be concluded that there is a 95 percent chance that the asset will be operable when needed and there is a 90 percent chance that the asset will be reliable over a 20-hour period. Keep in mind that many external factors will affect the reliability and availability of the asset. Harsh environments, fluid contamination, misalignment and unbalance will all affect the end result.

This is when oil analysis and other condition monitoring data can be used to identify common failures of like components and equipment, or if bad actors are weighing down the data.

As reliability awareness throughout the plant continues to spread from the engineering department into all areas of maintenance and production, it is our job as condition-monitoring professionals to understand the reliability basics and logic behind the data.