Re: MTBF (was: PLC Failure rate data)

  • Thread starter Joe Jansen/TECH/HQ/KEMET/US
  • Start date
J

Thread Starter

Joe Jansen/TECH/HQ/KEMET/US

OK, so I read the article and learned some stuff. ;^)

If I am reading this correctly MTBF only describes the failure rate during the normal life period of the life cycle, and does not account for infant mortality or wear out failure modes. Therefore, it is possible to have a
device that will fail due to "wear out" in 10 years, still have a MTBF of 400 years. MTBF seems to better describe failure rate than it does any life expectancy then, correct? MTTF, as defined in the article appears that it would take wear out failures into account, and would therefore be somewhat closer to reality.

Am I interpreting correctly? Doesn't this also mean that the MTBF numbers are somewhat indirectly dependant on the number of units in the field?

--Joe Jansen
 
M

Michael Griffin

On November 26, 2003 00:12, Joe Jansen wrote: <clip>
> If I am reading this correctly MTBF only describes the failure rate during
> the normal life period of the life cycle,
<clip>
> Am I interpreting correctly? Doesn't this also mean that the MTBF numbers
> are somewhat indirectly dependant on the number of units in the field?
<clip>


I'm not an expert on this subject, but my understanding of it is that the best way to think of MTBF is not with respect to how long a single piece of hardware will last, but rather how frequently you would expect "random" failures if you had a large number of identical units.

For example, if you had 100 PLCs in service, and the MTBF was 24,000 hours, you would expect to see (on average) a failure every 10 days. This would obviously keep you rather busy. If however the MTBF was 2,400,000 hours, then you would expect a failure every 1000 days (every 2 - 3/4 years).

--

************************
Michael Griffin
London, Ont. Canada
************************
 
S

ScienceOfficer

Joe---

You are very close on all counts, but still missing a bigger picture.

MTBF describes the expected failure rate during the expected service life of a population, generally with a lot of constraints like temperature, humidity, vibration, shock, etc., as well as an expectation that the use is the intended one. For any component worth rating with an MTBF number, the user expectation would be that few will fail in normal use, and the MTTF number would more closely match the user experience for useful lifetime. For major PLC products, that's at least ten years of continuous operation at design limit conditions.

(A common experience today might be automotive camshaft timing belts. The MTBF must be millions of miles, but the MTTF dictates that that the manufacturers recommend that we replace them every 80,000 miles.)

Still, that's for all users, not just one. If you have lots of identical components in identical use, then your overall MTBF and MTTF experience will approach the official statistics. If you have only one, and it never fails, your MTBF appears to be infinity (but it isn't--- replace your camshaft timing belt at the suggested interval!). If you have only one, and it fails during its expected service life, then your personal experience is far worse than the norm. For a component, MTBF describes the experience of all users--- not any one user.

If there are a lot of identical components in the field, then the field experience MTBF number becomes pretty solid. Note that predicted MTBFs are used when a product is launched, and those are gradually replaced by field experience as the field population climbs. If you see a product get radically redesigned within months of launch, you may infer that failure to meet expected MTBF might be one reason.

Another part of the larger picture is that no one uses just components--- we all use systems. The MTBF of a system is a statistical function of the MTBFs of all of the components. Many of us are concerned about the availability of one particular manufacturing system every day, a unique one representing a population of exactly one. It's MTBF will be significantly less than the service life of any of the components, in general, because that's how the world works!

The math can be simplified and made intuitive by transforming MTBF into an availability number by component. If availability under a certain circumstance is expressed as a probability from zero to one, then the availability of the overall system is simply the product of the component availabilities. Thus, if you have two components, each with .99 availability each day, your system will be 98.01% likely to be available each day, because .99*.99=.9801, or 98.01%. The more components you have, the lower your real availability will be. Also, your system will never be more available than the least reliable component. (.99*.50=.495)

A famous example of this also comes from the auto industry. Suppose that each of the 20,000 parts in a car has an availability each day of .9999; in this oversimplified example, your daily availability is .9999^20000=.1353--- in other words, your chances of your car working on any given day would only be 13.53%!!! Obviously, a lot of the parts in your car have daily availabilities far greater than .9999!

You asked about the dependancy of MTBF numbers on the number of units in the field. The answer is that the numbers are _derived_ from all of the units in the field, and the more there are, the more you can believe the number--- even if it's wildly better or worse than your personal experience.

Meanwhile, your daily chances of being home for dinner are inversely proportional to the product of all of the availabilities of all of the components in your plant. That's why every successful plant throws out the things they find unreliable and adds more of the things they find reliable. And, since _anything_ can fail, that's why successful plants have at least quantity one of every component they use in sparage, ready to be put into service if a failure occurs. Even if it has a real MTBF of 380 years.

Hope this helps!

Larry Lawver Rexel / Central Florida
 
Thread starter Similar threads Forum Replies Date
J General Automation Chat 3
S Computing 1
H Process Control 3
I General Automation Chat 0
A General Automation Chat 4
Top