Project B49: Mind to Design Reliability as Job One

   Lets get started with a theoretical open source product concept and describe what needs to be done to get it from mind’s eye to finished design, while closely adhering to the discipline of best practices for incorporating the highest quality and reliability possible for a given budget.  Our budget for the design, documentation, components, pcba,  enclosure, and test verification is going to be $10,000 US dollars which we will raise through Kickstarter or Indie GoGo under the title of “First Million Hour MTBF Software Defined Radio for less than $100”

   I think it’s a good idea to start with a motto for this Blog.   So here it is

“B49 or the start of your day, get your Reli. check points out of the way”

  1. Multi-sourced Component Selection
  2. Quality Data or Life Test Data on hand or in process
  3. Simulation Data, Matlab, Spice file, Verilog, or VHDL models available for all components on the Bill of Material
  4. Component Availability within the project acceptable lead times

Before we go any further it is important to describe in some detail exactly what the term MTBF means.  Before we get to that it is important to spend a little time talking about the proper role of MTBF in determining product Reliability.  The use of MTBF has its limits.  It makes the assumption that the hazard, or failure rate of the test item is constant.  If it is not for instance if we are not past the infant mortality phase or are in the wear out phase of the test units life, then MTBF itself will not be a useful tool. MTBF itself is not a compete reliability treatment.  It just represents the average time to failure. If the test conditions are meet, It does however provide some measure of the reliability of the test unit.  For a more complete analysis of failure mechanisms we will be using HALT, which allows us to actually find the particular failure mechanisms, and proceed to design them out of the product, and ALT which is a method for us to analyze the particular failure mechanisms we’ve identified so we can understand them in order to characterize the reliability of the product with respect to those failure mechanisms.  To assist us in this process we will be using the industry standard tool for statistical analysis known as Minitab and this tool to help us determine a Reliability Demonstration Test Plan ( RDTP).  A very thorough Wikipedia definition of MTBF follows :

Mean time between failures (MTBF) describes the expected time between two failures for a repairable system, while mean time to failure (MTTF) denotes the expected time to failure for a non-repairable system. For example, three identical systems starting to function properly at time 0 are working until all of them fail. The first system failed at 100 hours, the second failed at 120 hours and the third failed at 130 hours. The MTBF of the system is the average of the three failure times, which is 116.667 hours. If the systems are non-repairable, then their MTTF would be 116.667 hours.

In general, MTBF is the “up-time” between two failure states of a repairable system during operation as outlined here:

For each observation, the “down time” is the instantaneous time it went down, which is after (i.e. greater than) the moment it went up, the “up time”. The difference (“down time” minus “up time”) is the amount of time it was operating between these two events.

Once the MTBF of a system is known, the probability that any one particular system will be operational at time equal to the MTBF can be calculated. This calculation requires that the system is working within its “useful life period”, which is characterized by a relatively constant failure rate (the middle part of the “bathtub curve“) when only random failures are occurring. Under this assumption, any one particular system will survive to its calculated MTBF with a probability of 36.8% (i.e., it will fail before with a probability of 63.2%). The same applies to the MTTF of a system working within this time period.

MTBF value prediction is an important element in the development of products. Reliability engineers and design engineers often use reliability software to calculate a product’s MTBF according to various methods and standards (MIL-HDBK-217F, Telcordia SR332, Siemens Norm, FIDES,UTE 80-810 (RDF2000), etc.). The Mil-HDBK-217 reliability calculator manual in combination with RelCalc software (or other comparable tool) enables MTBF reliability rates to be predicted based on design.

MTBF
MTBF

A concept which is closely related to MTBF, and is important in the computations involving MTBF, is the mean down time (MDT). MDT can be defined as mean time which the system is down after the failure. Usually, MDT is considered different from MTTR (Mean Time To Repair); in particular, MDT usually includes organizational and logistical factors (such as business days or waiting for components to arrive) while MTTR is usually understood as more narrow and more technical.

By referring to the figure above, the MTBF of a component is the sum of the lengths of the operational periods divided by the number of observed failures.

Formal definition of MTBF and MDT

Overview[edit]

Mean time between failures (MTBF) describes the expected time between two failures for a repairable system, while mean time to failure (MTTF) denotes the expected time to failure for a non-repairable system. For example, three identical systems starting to function properly at time 0 are working until all of them fail. The first system failed at 100 hours, the second failed at 120 hours and the third failed at 130 hours. The MTBF of the system is the average of the three failure times, which is 116.667 hours. If the systems are non-repairable, then their MTTF would be 116.667 hours.

In general, MTBF is the “up-time” between two failure states of a repairable system during operation as outlined here:

Time between failures.svg

For each observation, the “down time” is the instantaneous time it went down, which is after (i.e. greater than) the moment it went up, the “up time”. The difference (“down time” minus “up time”) is the amount of time it was operating between these two events.

Once the MTBF of a system is known, the probability that any one particular system will be operational at time equal to the MTBF can be calculated. This calculation requires that the system is working within its “useful life period”, which is characterized by a relatively constant failure rate (the middle part of the “bathtub curve“) when only random failures are occurring. Under this assumption, any one particular system will survive to its calculated MTBF with a probability of 36.8% (i.e., it will fail before with a probability of 63.2%). The same applies to the MTTF of a system working within this time period.[4][5]

MTBF value prediction is an important element in the development of products. Reliability engineers and design engineers often use reliability software to calculate a product’s MTBF according to various methods and standards (MIL-HDBK-217F, Telcordia SR332, Siemens Norm, FIDES,UTE 80-810 (RDF2000), etc.). The Mil-HDBK-217 reliability calculator manual in combination with RelCalc software (or other comparable tool) enables MTBF reliability rates to be predicted based on design.

A concept which is closely related to MTBF, and is important in the computations involving MTBF, is the mean down time (MDT). MDT can be defined as mean time which the system is down after the failure. Usually, MDT is considered different from MTTR (Mean Time To Repair); in particular, MDT usually includes organizational and logistical factors (such as business days or waiting for components to arrive) while MTTR is usually understood as more narrow and more technical.

Formal definition of MTBF and MDT[edit]

By referring to the figure above, the MTBF of a component is the sum of the lengths of the operational periods divided by the number of observed failures:

{\displaystyle {\text{MTBF}}={\frac {\sum {({\text{start of downtime}}-{\text{start of uptime}})}}{\text{number of failures}}}.}

In a similar manner, MDT can be defined as

{\displaystyle {\text{MDT}}={\frac {\sum {({\text{start of uptime}}-{\text{start of downtime}})}}{\text{number of failures}}}.}

The MTBF can be alternatively defined in terms of the expected value of the density function ƒ(t) of time until failure, also often referred as reliability function:

{\displaystyle {\text{MTBF}}=\int _{0}^{\infty }tf(t)\,dt\;.}

MTBF and MDT for networks of components[edit]

Two components {\displaystyle c_{1},c_{2}} (for instance hard drives, servers, etc) may be arranged in a network, in series or in parallel. The terminology is here used by close analogy to electrical circuits, but has a slightly different meaning. We say that the two components are in series if the failure of either causes the failure of the network, and that they are in parallel if only the failure of both causes the network to fail. The MTBF of the resulting two-component network with repairable components can be computed according to the following formulae, assuming that the MTBF of both individual components is known:[6][7]

{\displaystyle {\text{mtbf}}(c_{1};c_{2})={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}+{\frac {1}{{\text{mtbf}}(c_{2})}}}}={\frac {{\text{mtbf}}(c_{1})\times {\text{mtbf}}(c_{2})}{{\text{mtbf}}(c_{1})+{\text{mtbf}}(c_{2})}}\;,}

where {\displaystyle c_{1};c_{2}} is the network in which the components are arranged in series.

For the network containing parallel repairable components, to find out the MTBF of the whole system, in addition to component MTBFs, it is also necessary to know their respective MDTs. Then, assuming that MDTs are negligible compared to MTBFs (which usually stands in practice), the MTBF for the parallel system consisting from two parallel repairable components can be written as follows:[6][7]

{\displaystyle {\begin{aligned}{\text{mtbf}}(c_{1}\parallel c_{2})&={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}\times {\text{PF}}(c_{2},{\text{mdt}}(c_{1}))+{\frac {1}{{\text{mtbf}}(c_{2})}}\times {\text{PF}}(c_{1},{\text{mdt}}(c_{2}))}}\\[1em]&={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}\times {\frac {{\text{mdt}}(c_{1})}{{\text{mtbf}}(c_{2})}}+{\frac {1}{{\text{mtbf}}(c_{2})}}\times {\frac {{\text{mdt}}(c_{2})}{{\text{mtbf}}(c_{1})}}}}\\[1em]&={\frac {{\text{mtbf}}(c_{1})\times {\text{mtbf}}(c_{2})}{{\text{mdt}}(c_{1})+{\text{mdt}}(c_{2})}}\;,\end{aligned}}}

where {\displaystyle c_{1}\parallel c_{2}} is the network in which the components are arranged in parallel, and {\displaystyle PF(c,t)} is the probability of failure of component {\displaystyle c} during “vulnerability window” {\displaystyle t}.

Intuitively, both these formulae can be explained from the point of view of failure probabilities. First of all, let’s note that the probability of a system failing within a certain timeframe is the inverse of its MTBF. Then, when considering series of components, failure of any component leads to the failure of the whole system, so (assuming that failure probabilities are small, which is usually the case) probability of the failure of the whole system within a given interval can be approximated as a sum of failure probabilities of the components. With parallel components the situation is a bit more complicated: the whole system will fail if and only if after one of the components fails, the other component fails while the first component is being repaired; this is where MDT comes into play: the faster the first component is repaired, the less is the “vulnerability window” for the other component to fail.

Using similar logic, MDT for a system out of two serial components can be calculated as:[6]

{\displaystyle {\text{mdt}}(c_{1};c_{2})={\frac {{\text{mtbf}}(c_{1})\times {\text{mdt}}(c_{2})+{\text{mtbf}}(c_{2})\times {\text{mdt}}(c_{1})}{{\text{mtbf}}(c_{1})+{\text{mtbf}}(c_{2})}}\;,}

and for a system out of two parallel components MDT can be calculated as:[6]

{\displaystyle {\text{mdt}}(c_{1}\parallel c_{2})={\frac {{\text{mdt}}(c_{1})\times {\text{mdt}}(c_{2})}{{\text{mdt}}(c_{1})+{\text{mdt}}(c_{2})}}\;.}

Through successive application of these four formulae, the MTBF and MDT of any network of repairable components can be computed, provided that the MTBF and MDT is known for each component. In a special but all-important case of several serial components, MTBF calculation can be easily generalised into

{\displaystyle {\text{mtbf}}(c_{1};\dots ;c_{n})=\left(\sum _{k=1}^{n}{\frac {1}{{\text{mtbf}}(c_{k})}}\right)^{-1}\;,}

which can be shown by induction,[8] and likewise

{\displaystyle {\text{mdt}}(c_{1}\parallel \dots \parallel c_{n})=\left(\sum _{k=1}^{n}{\frac {1}{{\text{mdt}}(c_{k})}}\right)^{-1}\;,}

since the formula for the mdt of two components in parallel is identical to that of the mtbf for two components in series.

Variations of MTBF[edit]

There are many variations of MTBF, such as mean time between system aborts (MTBSA), mean time between critical failures (MTBCF) or mean time between unscheduled removal (MTBUR). Such nomenclature is used when it is desirable to differentiate among types of failures, such as critical and non-critical failures. For example, in an automobile, the failure of the FM radio does not prevent the primary operation of the vehicle. Mean time to failure (MTTF) is sometimes used instead of MTBF in cases where a system is replaced after a failure, since MTBF denotes time between failures in a system which is repaired. MTTFd is an extension of MTTF, where MTTFd is only concerned about failures which would result in a dangerous condition.

MTTF and MTTFd calculation[edit]

{\displaystyle {\begin{aligned}{\text{MTTF}}&\approx {\frac {B_{10}}{0.1n_{\text{onm}}}},\\[8pt]{\text{MTTFd}}&\approx {\frac {B_{10d}}{0.1n_{\text{op}}}},\end{aligned}}}

where B10 is the number of operations that a device will operate prior to 10% of a sample of those devices would fail and nop is number of operations. B10d is the same calculation, but where 10% of the sample would fail to danger. nop is the number of operations/cycles in one year.[9]

Overview[edit]

Mean time between failures (MTBF) describes the expected time between two failures for a repairable system, while mean time to failure (MTTF) denotes the expected time to failure for a non-repairable system. For example, three identical systems starting to function properly at time 0 are working until all of them fail. The first system failed at 100 hours, the second failed at 120 hours and the third failed at 130 hours. The MTBF of the system is the average of the three failure times, which is 116.667 hours. If the systems are non-repairable, then their MTTF would be 116.667 hours.

In general, MTBF is the “up-time” between two failure states of a repairable system during operation as outlined here:

Time between failures.svg

For each observation, the “down time” is the instantaneous time it went down, which is after (i.e. greater than) the moment it went up, the “up time”. The difference (“down time” minus “up time”) is the amount of time it was operating between these two events.

Once the MTBF of a system is known, the probability that any one particular system will be operational at time equal to the MTBF can be calculated. This calculation requires that the system is working within its “useful life period”, which is characterized by a relatively constant failure rate (the middle part of the “bathtub curve“) when only random failures are occurring. Under this assumption, any one particular system will survive to its calculated MTBF with a probability of 36.8% (i.e., it will fail before with a probability of 63.2%). The same applies to the MTTF of a system working within this time period.[4][5]

MTBF value prediction is an important element in the development of products. Reliability engineers and design engineers often use reliability software to calculate a product’s MTBF according to various methods and standards (MIL-HDBK-217F, Telcordia SR332, Siemens Norm, FIDES,UTE 80-810 (RDF2000), etc.). The Mil-HDBK-217 reliability calculator manual in combination with RelCalc software (or other comparable tool) enables MTBF reliability rates to be predicted based on design.

A concept which is closely related to MTBF, and is important in the computations involving MTBF, is the mean down time (MDT). MDT can be defined as mean time which the system is down after the failure. Usually, MDT is considered different from MTTR (Mean Time To Repair); in particular, MDT usually includes organizational and logistical factors (such as business days or waiting for components to arrive) while MTTR is usually understood as more narrow and more technical.

Formal definition of MTBF and MDT[edit]

By referring to the figure above, the MTBF of a component is the sum of the lengths of the operational periods divided by the number of observed failures:

{\displaystyle {\text{MTBF}}={\frac {\sum {({\text{start of downtime}}-{\text{start of uptime}})}}{\text{number of failures}}}.}

In a similar manner, MDT can be defined as

{\displaystyle {\text{MDT}}={\frac {\sum {({\text{start of uptime}}-{\text{start of downtime}})}}{\text{number of failures}}}.}

The MTBF can be alternatively defined in terms of the expected value of the density function ƒ(t) of time until failure, also often referred as reliability function:

{\displaystyle {\text{MTBF}}=\int _{0}^{\infty }tf(t)\,dt\;.}

MTBF and MDT for networks of components[edit]

Two components {\displaystyle c_{1},c_{2}} (for instance hard drives, servers, etc) may be arranged in a network, in series or in parallel. The terminology is here used by close analogy to electrical circuits, but has a slightly different meaning. We say that the two components are in series if the failure of either causes the failure of the network, and that they are in parallel if only the failure of both causes the network to fail. The MTBF of the resulting two-component network with repairable components can be computed according to the following formulae, assuming that the MTBF of both individual components is known:[6][7]

{\displaystyle {\text{mtbf}}(c_{1};c_{2})={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}+{\frac {1}{{\text{mtbf}}(c_{2})}}}}={\frac {{\text{mtbf}}(c_{1})\times {\text{mtbf}}(c_{2})}{{\text{mtbf}}(c_{1})+{\text{mtbf}}(c_{2})}}\;,}

where {\displaystyle c_{1};c_{2}} is the network in which the components are arranged in series.

For the network containing parallel repairable components, to find out the MTBF of the whole system, in addition to component MTBFs, it is also necessary to know their respective MDTs. Then, assuming that MDTs are negligible compared to MTBFs (which usually stands in practice), the MTBF for the parallel system consisting from two parallel repairable components can be written as follows:[6][7]

{\displaystyle {\begin{aligned}{\text{mtbf}}(c_{1}\parallel c_{2})&={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}\times {\text{PF}}(c_{2},{\text{mdt}}(c_{1}))+{\frac {1}{{\text{mtbf}}(c_{2})}}\times {\text{PF}}(c_{1},{\text{mdt}}(c_{2}))}}\\[1em]&={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}\times {\frac {{\text{mdt}}(c_{1})}{{\text{mtbf}}(c_{2})}}+{\frac {1}{{\text{mtbf}}(c_{2})}}\times {\frac {{\text{mdt}}(c_{2})}{{\text{mtbf}}(c_{1})}}}}\\[1em]&={\frac {{\text{mtbf}}(c_{1})\times {\text{mtbf}}(c_{2})}{{\text{mdt}}(c_{1})+{\text{mdt}}(c_{2})}}\;,\end{aligned}}}

where {\displaystyle c_{1}\parallel c_{2}} is the network in which the components are arranged in parallel, and {\displaystyle PF(c,t)} is the probability of failure of component {\displaystyle c} during “vulnerability window” {\displaystyle t}.

Intuitively, both these formulae can be explained from the point of view of failure probabilities. First of all, let’s note that the probability of a system failing within a certain timeframe is the inverse of its MTBF. Then, when considering series of components, failure of any component leads to the failure of the whole system, so (assuming that failure probabilities are small, which is usually the case) probability of the failure of the whole system within a given interval can be approximated as a sum of failure probabilities of the components. With parallel components the situation is a bit more complicated: the whole system will fail if and only if after one of the components fails, the other component fails while the first component is being repaired; this is where MDT comes into play: the faster the first component is repaired, the less is the “vulnerability window” for the other component to fail.

Using similar logic, MDT for a system out of two serial components can be calculated as:[6]

{\displaystyle {\text{mdt}}(c_{1};c_{2})={\frac {{\text{mtbf}}(c_{1})\times {\text{mdt}}(c_{2})+{\text{mtbf}}(c_{2})\times {\text{mdt}}(c_{1})}{{\text{mtbf}}(c_{1})+{\text{mtbf}}(c_{2})}}\;,}

and for a system out of two parallel components MDT can be calculated as:[6]

{\displaystyle {\text{mdt}}(c_{1}\parallel c_{2})={\frac {{\text{mdt}}(c_{1})\times {\text{mdt}}(c_{2})}{{\text{mdt}}(c_{1})+{\text{mdt}}(c_{2})}}\;.}

Through successive application of these four formulae, the MTBF and MDT of any network of repairable components can be computed, provided that the MTBF and MDT is known for each component. In a special but all-important case of several serial components, MTBF calculation can be easily generalised into

{\displaystyle {\text{mtbf}}(c_{1};\dots ;c_{n})=\left(\sum _{k=1}^{n}{\frac {1}{{\text{mtbf}}(c_{k})}}\right)^{-1}\;,}

which can be shown by induction,[8] and likewise

{\displaystyle {\text{mdt}}(c_{1}\parallel \dots \parallel c_{n})=\left(\sum _{k=1}^{n}{\frac {1}{{\text{mdt}}(c_{k})}}\right)^{-1}\;,}

since the formula for the mdt of two components in parallel is identical to that of the mtbf for two components in series.

Variations of MTBF[edit]

There are many variations of MTBF, such as mean time between system aborts (MTBSA), mean time between critical failures (MTBCF) or mean time between unscheduled removal (MTBUR). Such nomenclature is used when it is desirable to differentiate among types of failures, such as critical and non-critical failures. For example, in an automobile, the failure of the FM radio does not prevent the primary operation of the vehicle. Mean time to failure (MTTF) is sometimes used instead of MTBF in cases where a system is replaced after a failure, since MTBF denotes time between failures in a system which is repaired. MTTFd is an extension of MTTF, where MTTFd is only concerned about failures which would result in a dangerous condition.

MTTF and MTTFd calculation[edit]

{\displaystyle {\begin{aligned}{\text{MTTF}}&\approx {\frac {B_{10}}{0.1n_{\text{onm}}}},\\[8pt]{\text{MTTFd}}&\approx {\frac {B_{10d}}{0.1n_{\text{op}}}},\end{aligned}}}

where B10 is the number of operations that a device will operate prior to 10% of a sample of those devices would fail and nop is number of operations. B10d is the same calculation, but where 10% of the sample would fail to danger. nop is the number of operations/cycles in one year.[9]

Formal definition of MTBF and MDT[edit]

By referring to the figure above, the MTBF of a component is the sum of the lengths of the operational periods divided by the number of observed failures:

{\displaystyle {\text{MTBF}}={\frac {\sum {({\text{start of downtime}}-{\text{start of uptime}})}}{\text{number of failures}}}.}

In a similar manner, MDT can be defined as

{\displaystyle {\text{MDT}}={\frac {\sum {({\text{start of uptime}}-{\text{start of downtime}})}}{\text{number of failures}}}.}

The MTBF can be alternatively defined in terms of the expected value of the density function ƒ(t) of time until failure, also often referred as reliability function:

{\displaystyle {\text{MTBF}}=\int _{0}^{\infty }tf(t)\,dt\;.}

MTBF and MDT for networks of components[edit]

Two components {\displaystyle c_{1},c_{2}} (for instance hard drives, servers, etc) may be arranged in a network, in series or in parallel. The terminology is here used by close analogy to electrical circuits, but has a slightly different meaning. We say that the two components are in series if the failure of either causes the failure of the network, and that they are in parallel if only the failure of both causes the network to fail. The MTBF of the resulting two-component network with repairable components can be computed according to the following formulae, assuming that the MTBF of both individual components is known:[6][7]

{\displaystyle {\text{mtbf}}(c_{1};c_{2})={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}+{\frac {1}{{\text{mtbf}}(c_{2})}}}}={\frac {{\text{mtbf}}(c_{1})\times {\text{mtbf}}(c_{2})}{{\text{mtbf}}(c_{1})+{\text{mtbf}}(c_{2})}}\;,}

where {\displaystyle c_{1};c_{2}} is the network in which the components are arranged in series.

For the network containing parallel repairable components, to find out the MTBF of the whole system, in addition to component MTBFs, it is also necessary to know their respective MDTs. Then, assuming that MDTs are negligible compared to MTBFs (which usually stands in practice), the MTBF for the parallel system consisting from two parallel repairable components can be written as follows:[6][7]

{\displaystyle {\begin{aligned}{\text{mtbf}}(c_{1}\parallel c_{2})&={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}\times {\text{PF}}(c_{2},{\text{mdt}}(c_{1}))+{\frac {1}{{\text{mtbf}}(c_{2})}}\times {\text{PF}}(c_{1},{\text{mdt}}(c_{2}))}}\\[1em]&={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}\times {\frac {{\text{mdt}}(c_{1})}{{\text{mtbf}}(c_{2})}}+{\frac {1}{{\text{mtbf}}(c_{2})}}\times {\frac {{\text{mdt}}(c_{2})}{{\text{mtbf}}(c_{1})}}}}\\[1em]&={\frac {{\text{mtbf}}(c_{1})\times {\text{mtbf}}(c_{2})}{{\text{mdt}}(c_{1})+{\text{mdt}}(c_{2})}}\;,\end{aligned}}}

where {\displaystyle c_{1}\parallel c_{2}} is the network in which the components are arranged in parallel, and {\displaystyle PF(c,t)} is the probability of failure of component {\displaystyle c} during “vulnerability window” {\displaystyle t}.

Intuitively, both these formulae can be explained from the point of view of failure probabilities. First of all, let’s note that the probability of a system failing within a certain timeframe is the inverse of its MTBF. Then, when considering series of components, failure of any component leads to the failure of the whole system, so (assuming that failure probabilities are small, which is usually the case) probability of the failure of the whole system within a given interval can be approximated as a sum of failure probabilities of the components. With parallel components the situation is a bit more complicated: the whole system will fail if and only if after one of the components fails, the other component fails while the first component is being repaired; this is where MDT comes into play: the faster the first component is repaired, the less is the “vulnerability window” for the other component to fail.

Using similar logic, MDT for a system out of two serial components can be calculated as:[6]

{\displaystyle {\text{mdt}}(c_{1};c_{2})={\frac {{\text{mtbf}}(c_{1})\times {\text{mdt}}(c_{2})+{\text{mtbf}}(c_{2})\times {\text{mdt}}(c_{1})}{{\text{mtbf}}(c_{1})+{\text{mtbf}}(c_{2})}}\;,}

and for a system out of two parallel components MDT can be calculated as:[6]

{\displaystyle {\text{mdt}}(c_{1}\parallel c_{2})={\frac {{\text{mdt}}(c_{1})\times {\text{mdt}}(c_{2})}{{\text{mdt}}(c_{1})+{\text{mdt}}(c_{2})}}\;.}

Through successive application of these four formulae, the MTBF and MDT of any network of repairable components can be computed, provided that the MTBF and MDT is known for each component. In a special but all-important case of several serial components, MTBF calculation can be easily generalised into

{\displaystyle {\text{mtbf}}(c_{1};\dots ;c_{n})=\left(\sum _{k=1}^{n}{\frac {1}{{\text{mtbf}}(c_{k})}}\right)^{-1}\;,}

which can be shown by induction,[8] and likewise

{\displaystyle {\text{mdt}}(c_{1}\parallel \dots \parallel c_{n})=\left(\sum _{k=1}^{n}{\frac {1}{{\text{mdt}}(c_{k})}}\right)^{-1}\;,}

since the formula for the mdt of two components in parallel is identical to that of the mtbf for two components in series.

Formal definition of MTBF and MDT[edit]

By referring to the figure above, the MTBF of a component is the sum of the lengths of the operational periods divided by the number of observed failures:

{\displaystyle {\text{MTBF}}={\frac {\sum {({\text{start of downtime}}-{\text{start of uptime}})}}{\text{number of failures}}}.}

In a similar manner, MDT can be defined as

{\displaystyle {\text{MDT}}={\frac {\sum {({\text{start of uptime}}-{\text{start of downtime}})}}{\text{number of failures}}}.}

The MTBF can be alternatively defined in terms of the expected value of the density function ƒ(t) of time until failure, also often referred as reliability function:

{\displaystyle {\text{MTBF}}=\int _{0}^{\infty }tf(t)\,dt\;.}

MTBF and MDT for networks of components[edit]

Two components {\displaystyle c_{1},c_{2}} (for instance hard drives, servers, etc) may be arranged in a network, in series or in parallel. The terminology is here used by close analogy to electrical circuits, but has a slightly different meaning. We say that the two components are in series if the failure of either causes the failure of the network, and that they are in parallel if only the failure of both causes the network to fail. The MTBF of the resulting two-component network with repairable components can be computed according to the following formulae, assuming that the MTBF of both individual components is known:[6][7]

{\displaystyle {\text{mtbf}}(c_{1};c_{2})={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}+{\frac {1}{{\text{mtbf}}(c_{2})}}}}={\frac {{\text{mtbf}}(c_{1})\times {\text{mtbf}}(c_{2})}{{\text{mtbf}}(c_{1})+{\text{mtbf}}(c_{2})}}\;,}

where {\displaystyle c_{1};c_{2}} is the network in which the components are arranged in series.

For the network containing parallel repairable components, to find out the MTBF of the whole system, in addition to component MTBFs, it is also necessary to know their respective MDTs. Then, assuming that MDTs are negligible compared to MTBFs (which usually stands in practice), the MTBF for the parallel system consisting from two parallel repairable components can be written as follows:[6][7]

{\displaystyle {\begin{aligned}{\text{mtbf}}(c_{1}\parallel c_{2})&={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}\times {\text{PF}}(c_{2},{\text{mdt}}(c_{1}))+{\frac {1}{{\text{mtbf}}(c_{2})}}\times {\text{PF}}(c_{1},{\text{mdt}}(c_{2}))}}\\[1em]&={\frac {1}{{\frac {1}{{\text{mtbf}}(c_{1})}}\times {\frac {{\text{mdt}}(c_{1})}{{\text{mtbf}}(c_{2})}}+{\frac {1}{{\text{mtbf}}(c_{2})}}\times {\frac {{\text{mdt}}(c_{2})}{{\text{mtbf}}(c_{1})}}}}\\[1em]&={\frac {{\text{mtbf}}(c_{1})\times {\text{mtbf}}(c_{2})}{{\text{mdt}}(c_{1})+{\text{mdt}}(c_{2})}}\;,\end{aligned}}}

where {\displaystyle c_{1}\parallel c_{2}} is the network in which the components are arranged in parallel, and {\displaystyle PF(c,t)} is the probability of failure of component {\displaystyle c} during “vulnerability window” {\displaystyle t}.

Intuitively, both these formulae can be explained from the point of view of failure probabilities. First of all, let’s note that the probability of a system failing within a certain timeframe is the inverse of its MTBF. Then, when considering series of components, failure of any component leads to the failure of the whole system, so (assuming that failure probabilities are small, which is usually the case) probability of the failure of the whole system within a given interval can be approximated as a sum of failure probabilities of the components. With parallel components the situation is a bit more complicated: the whole system will fail if and only if after one of the components fails, the other component fails while the first component is being repaired; this is where MDT comes into play: the faster the first component is repaired, the less is the “vulnerability window” for the other component to fail.

Using similar logic, MDT for a system out of two serial components can be calculated as:[6]

{\displaystyle {\text{mdt}}(c_{1};c_{2})={\frac {{\text{mtbf}}(c_{1})\times {\text{mdt}}(c_{2})+{\text{mtbf}}(c_{2})\times {\text{mdt}}(c_{1})}{{\text{mtbf}}(c_{1})+{\text{mtbf}}(c_{2})}}\;,}

and for a system out of two parallel components MDT can be calculated as:[6]

{\displaystyle {\text{mdt}}(c_{1}\parallel c_{2})={\frac {{\text{mdt}}(c_{1})\times {\text{mdt}}(c_{2})}{{\text{mdt}}(c_{1})+{\text{mdt}}(c_{2})}}\;.}

Through successive application of these four formulae, the MTBF and MDT of any network of repairable components can be computed, provided that the MTBF and MDT is known for each component. In a special but all-important case of several serial components, MTBF calculation can be easily generalised into

{\displaystyle {\text{mtbf}}(c_{1};\dots ;c_{n})=\left(\sum _{k=1}^{n}{\frac {1}{{\text{mtbf}}(c_{k})}}\right)^{-1}\;,}

which can be shown by induction,[8] and likewise

{\displaystyle {\text{mdt}}(c_{1}\parallel \dots \parallel c_{n})=\left(\sum _{k=1}^{n}{\frac {1}{{\text{mdt}}(c_{k})}}\right)^{-1}\;,}

since the formula for the mdt of two components in parallel is identical to that of the mtbf for two components in series.

(0)

One thought on “Project B49: Mind to Design Reliability as Job One

Leave a Reply

Your email address will not be published. Required fields are marked *

CommentLuv badge