top of page

Caterpillar mechanic.  Communicate with FINESSE.
Caterpillar mechanic servicing a productive engine long past its predicted Likelihood of Failure.

An incredible number of technical professionals believe that effective asset management can only be performed with a meaningful prediction of the likelihood of failure. This belief, in turn, has led to efforts over the past 30 years to a desire to collect massive amounts of historical data and information.


Nothing can be further from the truth. Evaluating the likelihood of failure is a fool's game – more wrong than right. As it turns out, small data is just as good as big data. And in the cases that matter most, small data (at best) is all we will have.


Complexity

A system is a collection of interrelated parts that produce an outcome that the individual parts cannot. By definition, complexity is a collection of many parts. Most industrial and human systems are complex.


A system's upside is that all interrelated parts do not have to perform optimally to produce the desired result. This is also the downside––it is difficult, if not impossible, to complete a predictive description of how each part or asset will fail. Context (operating conditions and environment) matters, the definition of failure matters, and many components fail with the same symptoms.

Sorting through the interrelated behavior of which components cause what is necessarily resource intensive, if we can even pinpoint when the failure occurred. Most organizations do not expend the necessary resources with any degree of consistency. In other words, in practice, we have difficulty sorting through which components truly cause a larger system not to perform as the user desires.


Sample Size and Evolving Systems

Most systems have a limited number of major assets and we do not let those run to failure. We simply do not have enough failure data on the things that matter most in their specific operating conditions and environment.


Major assets and their components are also constantly changing. This is partly driven by maintenance and engineering responsibilities to keep them performing. It is the result of manufacturers improving the components that make up the system over the long life of most major systems.


So even if we let the things that matter most run to failure, there are differences in the evolving major assets and components that create sample size issues in the statistical analysis.


Failure Reporting

Failure reporting changes over the life of an organization and over the life of major assets. Facilities within the same organization frequently report failures differently than their peer facilities. In some sectors, such as the Department of Defense, failure reporting is done meticulously in normal times but often ignored when under the most relevant stress of deployment.


In other words, differences in failure reporting, interpretation, and presentation can vary greatly by different users in the same systems or third-party service companies and manufacturers.



We Are Doing a Good Job

A contradiction in collecting large sets of failure data is frequently cited by Reliability Centered Maintenance (RCM) experts and credited to Howard Resnikoff, a former director of the Division of Information Science and Technology at the National Science Foundation. Resnikoff's conundrum states that if we successfully perform our work well, then we are suppressing the very failure data we need to build a statistically accurate failure model.


In other words, we may be collecting lots of failure data, but it is the wrong data — the data that doesn't matter much—because we are not allowing the things that matter the most to fail often enough.


How Does This Help You

You are doing the right thing if you are collecting monitoring data from your most critical systems at the right locations based on the way you believe things fail. That means you have probably performed a criticality analysis, performed failure modes & effects analysis (FMEA), and developed some type of finite element model before installing the data collection devices.


You are also doing the right thing if you are populating your enterprise asset management system with the best failure information you can and analyzing it for obvious trends in system performance. Hopefully, you are also experimenting and tweaking your system maintenance protocols and frequencies.


You are doing the WRONG thing if you make rigorous attempts to drive your asset management program by believing that you can predict the likelihood of failure correctly and consistently. There is not enough data to make a statistically accurate prediction because we do not consistently spend the resources to understand component level failures, we collect and report data inconsistently, critical systems are constantly evolving, and we do a good job of preventing failure in the things that matter most (so most of the failure data we do have are from things that do not matter as much).


This leaves us in the analytical world of small data combined with good judgment. The next time you think you can accurately predict the likelihood of failure, bet on when anything of importance, or even your hot water heater or car, will fail.

 

JD Solomon Inc provides reliability and risk assessments through our asset management services. Contact us for more information on how to streamline and make your risk-based asset management program more effective.




Elephant balancing on a scale.
Knowing the elephant is big is usually enough of an evaluation for senior management. (photo: John Lund)

The idea of anything worth doing is worth measuring is a persistent one. The persistent problem is that measuring performance becomes much more difficult as we move into the realms of strategic plans and management systems.


I usually write about technical subject matter that I am working on for clients. For strategic plans and asset management programs, that means there is no shortage of material and no shortage of market sectors. And it is a topic that I could write about on a quarterly basis if I choose.


For now, let’s touch on the fundamentals one more time.


Measures versus Indicators

Measures and indicators are two ways we can measure performance. Let’s go to the dictionary.


Measure means the dimensions, capacity, or amount of something ascertained by measuring.

Indicate means to be a sign, symptom, or index of. Indicators are defined as any of a group of statistical values (such as level of employment) that taken together give an indication (of the health of the economy)

So, an indicator indicates something. A measure is the quantity, size, weight, distance, or capacity of a substance compared to a designated standard.


Different Organizational Levels

Similar to levels of service and risk, performance evaluation can be applied at three levels of the organization.


Strategic level indicators answer the question, “How do I manage a project toward strategic goals?” The management horizon is quarterly or annually. The priority is meeting the strategic goals and objectives of the organization. The metrics at this level are usually weight-of-the-evidence indicators.


Operational level indicators usually answer the dual questions, “Are business outcomes being accomplished?” and “Are customer requirements being met?” The management horizon is monthly or quarterly. The priority is on trends and corrective action. The metrics at this level can be either indicators or measures and are most commonly a mixture.


Tactical level indicators answer question like, "Is the daily operation efficient" and "Are business unit adjustments being made promptly?" The management horizon is weekly or daily. The metrics at this level are mostly measures and can include some indicators.


The Number of Performance Measures

Four to eight at each level per each strategic goal is usually sufficient. Care should be given not to double-count or have overlapping indicators. For example, financial can be broken into just four indicators—profitability, liquidity, debt, and operating. Of course, in practice, some indicators should be trailing (past performance) and some leading (predictive). Still, the point is that having twenty financial ratios that are most in one category is overkill.


Organizational Capacity

Organizational capacity is a function of organizational culture, business climate, and the CID (Capability, Information, and Decision Structure) nexus. Organizations with less capacity need to keep performance measurement simpler. For that matter, as with any management or monitoring system, keep the first-generation effort as simple as possible and build complexity later (and as needed).


Organizations tend to rush to create dashboards that compile performance information from multiple sources. In most cases, the required data is not being tracked except where it is specifically needed at the business unit (tactical) level. Harmonizing the data at higher levels is difficult because decisions are not made that way.


What This Means

Indicators are the best way to measure the performance of strategic plans and asset management systems. Create a scoring system that reduces the subjectivity of the evaluation information by a cross-functional team. Be comfortable with a weight-of-the-evidence approach. Start simple (red-yellow-green or one-to-five) and build complexity later. Most organizations find that later never comes –– or spend millions of dollars on developing and implementing formal measures only to end up back at the more simplified indicators.

 

JD Solomon Inc provides program development, asset management, and facilitation services for organizations working in the built and natural environment.





The prestigious Society of Maintenance and Reliability Professionals (SMRP) annual conference is coming to Raleigh for the first time. The event will be from October 17 to 20 at the Raleigh Convention Center. JD Solomon, Inc will be providing the following workshops and presentations.


Workshop: How to Get Your Boss’s Boss to Understand

Getting your boss’s boss to understand is one of the most challenging jobs for a technical professional. Senior management frequently does not understand concepts related to reliability, risk, and resiliency. Texts and guidance documents often reference the importance of better communication and education; however, limited practical guidance is provided. This session will fill some of these gaps and specifically target front-line professionals in their role as trusted advisors to senior decision makers.


Session: How Baseball Teaches Us Everything We Need to Know About Human Error

This session will explore the practical applications of human error. Human error is a subset of the broad and diverse subject of Human Reliability Analysis (HRA). There are more than 30 methodologies of varying complexities to evaluate human reliability. This proliferation creates confusion among practitioners. The parallel comparison of human error in the facilities and infrastructure workplace will be made with human errors in baseball.


Workshop: How To Be A Better Facilitator Of RCA, FMEA, & Other Reliability Methods

Facilitation is an essential skill for technical professionals working in group environments. Maintenance and reliability education does not include formal facilitation training and professionals are left to learn on the job. Unfortunately, many of our key analytical tools are dependent on the quality of the facilitation. This session will provide some key tips and pointers to assist the maintenance and reliability professional to move from being a good facilitator to becoming a great one. The workshop will focus directly on facilitated methods and tools that are most commonly used by maintenance and reliability professionals - failure modes and effects analysis (FMEA), root cause analysis (RCA), reliability block diagrams, and fault trees.


The Society for Maintenance & Reliability Professionals (SMRP) is a not-for-profit professional society formed by practitioners to advance the maintenance, reliability, and physical asset management profession. As its tagline says, “The Society by Practitioners, for Practitioners.”


JD Solomon, Inc provides consulting services related to program development, asset management (including reliability and risk), and facilitation for facilities, infrastructure and the natural environment. JD Solomon has been a regular presenter at SMRP conferences and workshops, and webinars for many years.



Experts
bottom of page