MTBF Predictions Often Misused

Quote of the Day

Nothing in the world is in a bigger hurry than a dead fish.

— I heard a Norwegian fisherman say this during an radio interview while discussing his need for a bridge to speed the truck transport of his fish to market.

Figure 1: P8 Poseiden. (Wikipedia)

Figure 1: P8 Poseiden, a new patrol plane that is having some teething problems. (Wikipedia)

Performing an MTBF prediction is to HW design as putting a license plate on the car is to driving the car. You need the license to drive the car legally, but it adds nothing to your driving experience. Similarly, every company I have worked for demands a predicted MTBF for every HW product, but it adds no value to the design process. I would argue that generating the MTBF predictions adds negative value to product deployment because it is often misused by customers to estimate spare requirements and field support costs. Since no one has told customers otherwise, they think the MTBF value accurately reflects the real failure rate of a product. MTBF predictions provide a gross estimate of the rate of random part failure at product maturity. Real products, especially at introduction, rarely fail because of random part failures. Instead, their failures are dominated by issues like:

  • Software upgrade issues
    I often see products returned because of an issue caused by a software problem. To the customer, the product did not work, and they return it.
  • Environmental problems
    Lightning is the most common issue environmental issue that I see, but there are others. They often have to do with a misapplication of the product. For example, the product was deployed in a wet environment, but it is sensitive to moisture.
  • Installation-induced failures
    Many modern products are complex, and their installation is not simple. For example, I have seen products deployed with improperly stored backup batteries, which sulfated the batteries and rendered them unchargeable. When the backup power was needed, the product failed.
  • Products returned with no discernable problems
    Over half the products I see returned are for problems that I could not replicate. This is usually because of training issues.

Because the return rates for these failure modes is much higher than the random part failure rate, customers are shocked when they see their real field support costs are much higher than those costs predicted using MTBF. Also, you often see unexpected reductions in system availability because the number of spares was seriously underestimated – another number commonly estimated using MTBF.

You rarely see this problem mentioned in the press, but I saw the following paragraph in an article on the troubled introduction of the P-8A Poseidon (Figure 1) that alludes to the issue.

Moreover, the report found, data from the operational testing and evaluation of the P-8A's latest software engineering upgrade as well as metrics from the Navy "show consistently negative trends in fleet-wide aircraft operational availability due to a shortage of spare parts and increased maintenance requirements."

When I have seen this problem, it always traces back to the use of an incorrect failure rate model. For this reason, I always estimate field support manpower and spare numbers based on the historical failure rates of similar products at similar levels of maturity.

This entry was posted in Electronics, Statistics. Bookmark the permalink.