Oil Analysis Alarms and Limits - Field-Tested Database Techniques

Andy Ling, Dingo Maintenance Systems

In the 1970s, excitement surrounded artificial intelligence and “expert” systems. It seemed that computers might soon free people for the pursuit of higher goals by automating lower-level decision-making.

Thirty years later, the impact has remained, for the most part, minimal. Computers are still limited because their systems must be set up. People still must define the rules for decision-making. As decision complexity increases, so does the programming required. In well-defined arenas, however, expert systems have been successfully employed for oil analysis.

Automating alarms through software can significantly increase the success of an oil analysis program by reducing the demands on the analysts, who may focus attention on those results most likely to prompt remedial actions. Statistical and graphical techniques that can be implemented with reduced effort with the aid of software can improve the quality of alarms, reducing false alarms while increasing sensitivity to real problems.

There remains, however, no substitute for the involvement of well-trained and experienced analysts. Many practitioners mistakenly treat oil analysis software as a mysterious “black box” that will provide them with answers without their active participation. Instead, the role of oil analysis software is to increase the efficiency and effectiveness of, not replace, the oil analysis practitioner.

Alarms and Automated Decision-making

Automated alarms are simply flags which prompt further investigation of potential problems. Before action, it is wise to verify the result through resampling and to consider other related parameters. For this, human interpretation is required.

Over time and with experience, however, it is possible to set alarm levels more exactly, increasing the accuracy of automated decision-making. It is important to keep automated decisions conservative, as the consequences of a false alarm (further investigation that proves unnecessary) are usually less significant than missing an opportunity to take remedial action.

Database software is a vital tool for efficiently analyzing the great volume of oil analysis data generated by an oil analysis program. Archiving oil analysis data in a database with graphical displays and relational and quantitative capabilities can aid in the selection of appropriate alarms. Alarms can be based on sophisticated statistical techniques when the data set is large and of high quality, or through simpler means such as visual interpretation of data displayed graphically.

Once these alarm-based decisions are automated via software, simple decision- making can be dramatically enhanced. Summary reports can focus the organization’s attention to those values and trends that require further analysis.

Setting Alarms with Site Data

Meaningful condition alarms are best set by examining site data, and software can be an enormous advantage. As these processes are most effective when based upon extensive historical data, there is significant value in loading pre-existing data into the oil analysis database.

Even if there is low confidence in its quality, it remains valuable as a baseline. If historical data is scarce, sampling frequencies should be increased in the short-term until a meaningful baseline is established. History and a population of similar machines analyzed together are great indicators of “normal” conditions.

Level-based alarms are the simplest type to set up and use in oil analysis software. Once the acceptable limits are set, an alert will automatically occur each time the parameter wanders outside of them. Levels can be based on the history of a single machine (Figure 1) or a group of similar machines (Figure 2).

Figure 1. Iron Wear Rate in Loader Engine

Figure 2. Group Iron Wear Rate for all Loader Engines in Class

For example, an alarm level at 0.1⁸ ppm/hr for iron wear rate on the basis of global or group data may seem coarse for the single engine; however, it may be appropriate given the data available for similar machines. Figure 2 shows data points in solid red for the single machine trended in Figure 1, while the open blue data points represent data from similar machines. It is possible to distinguish a limit band containing most of the data points. A group alarm level would be set just above the main band of data.

In this instance an individual alarm would likely be set lower than the group alarm, while in other cases, individual alarms may be higher. Group data alarm setting significantly reduces the time required to develop alarm levels for large numbers of similar components.

When the operation and history, and not just design, of machines are truly similar, group data provides more information for determining normal versus abnormal states. Group alarms, however, may be too conservative or not conservative enough for particular machines.

Experience will help determine whether group alarm levels are appropriate for individual members. If the trend characteristics of all group members are similar and variations are small, it makes sense to globally apply group alarm levels.

Checking repair records for failures or repairs that coincided with historical spikes or trend abnormalities and maintaining records on false alarms can also help determine if alarm levels are appropriate. A balance must always be struck between alarm conservatism and excessive false alarms that consume limited analysis and follow-up resources.

Corporations with multiple sites using similar equipment may consider using the oil analysis data from all operations together. This must be applied with care, as data from one site may have significantly different trend characteristics than data from another site due to variations in operating environment, sampling techniques, lab testing techniques, lubricant used, etc.

Such practices, however, allow corporations to leverage analysis expertise, with many sites benefiting from the expertise of a few sites or individuals. It can also make instantly available “historical” data, where none was locally available, for initial alarm setting. This can save time and minimize frustration when setting up a new program or adding new inspection points to an existing program. If all potential causes of variation are also stored in the database, their affects can be assessed.

Web-based systems designed to handle this information are in their infancy, but they can be expected to play a significant role in the future of oil analysis.

Statistical Alarms

Statistical alarms are level based alarms that have been determined by some statistical method. When the distribution of points is “normal,” the plot of frequency vs. measured value is a bell-shaped curve. The standard deviation is a measure of the “spread” or deviation of data from the mean or average.

The higher the standard deviation, the greater the data spread. In a truly normally distributed population, 68.3 percent of data points lie within a band one standard deviation either side of the mean; 99.7 percent of the data points lie within three standard deviations) either side of the mean.

Alarms are set at statistically significant distances from the mean, typically the average plus one standard deviation is used for a caution, while the average plus three standard deviations is typically used for a critical alarm limit (see sidebar).

The rigorous statistical method should be applied only to parameters that are not expected to change over time. Parameters that normally drift up or down, such as wear metal concentrations in an unfiltered system, that are not indicative of problems until an excessive change has occurred, will result in ever increasing false alarms.

Other alarms are more appropriate in these situations, such as absolute level limits, statistical rate of change (for more information see “Statistically Derived Rate-of-Change, Oil Analysis Limits and Alarms,” in the January - February 2002 issue of Practicing Oil Analysis magazine) or the pattern-finding alarms discussed next.

It is also worth noting that any intentional systematic change (like upgrading a filter, maintenance work or process changes) will alter the validity of statistically derived limits. Such changes normally warrant a new baseline period. The SPC method is also sensitive to bad data. For these reasons, some users have experienced difficulty with SPC in practice.

Pattern-finding Alarms

OA software provides opportunities for automating alarms other than level alarms. Finding patterns, such as sudden spikes, within the trends is invaluable as it can provide early warning before a level-based alarm is triggered. Using these alarms in conjunction with level-based alarms increases analysis sensitivity to problems, but increases the overall number of alarms. Whether these will primarily be false or actual alarms will depend on how well suited the data is to the alarm type and level.

Computed manually, these methods are time consuming. Archives of historical data and computing algorithms speed this process greatly, requiring analyst intervention only when an alarm is triggered.

When trying to distinguish between normal and abnormal wear situations, these techniques are all best applied to calculated rates of change, and not the raw wear metal concentrations supplied by the lab. Limited success will be experienced in applying alarms only to raw wear metal concentrations as these can vary significantly on any one component and still be normal.

In well-defined areas, expert software systems can have a positive impact on oil analysis. True success, however, still requires a commitment of resources to the program, whether internal or third party. A skilled person interpreting oil analysis results based on maintenance histories and operational data remains the most reliable way to make sound judgments. OA software with carefully chosen alarms that take into account component properties and behavior will make the most effective use of time by helping quickly determine whether a result is normal or abnormal.

Statistical alarms can be set for many oil analysis parameters by assuming the data is distributed normally and calculating the average and standard deviations based on the formulae below. Using this approach, caution limits for parameters such as wear metals are typically set at the average plus one standard deviation, with critical limits set at he average plus three standard deviations.

Even if the data is not normally distributed, according to the Central Limit Theorem, the sampling distribution will be approximately normal and can be used to set meaningful alarm limits. In general, as few as 10 data points, or less can be used to set limits, although the accuracy and reliability of these limits will be lower than if a larger dataset is used.

In most cases, 30 data points is considered the minimum number of points for statistical accuracy. However, for practical purposes, it is usually wise to start setting oil analysis limits once five to 10 data points are available, then revise the limits to include additional data points when they become available to increase the sensitivity and accuracy of statistically derived limits.