Skip to main content

Fault Tolerance and Redundancy

i need more numbers

Mean Time Between Failures - expected lifetime of an asset. (total operation time divided by number of failures) Mean Time to Failure - expected lifetime of a non-repairable asset (total operational time divided by number of devices) Mean Time to Repair - how long it takes to correct a fault, restore system to normal operation (total hours of unplanned maintenance divided by total number of failures)

Fault Tolerance

A system that can sustain failures in individual components and subsystems and continue to provide similar levels of service is said to be fault tolerant. Fault tolerance can be achieved via redundancy.

  • Redundant Spares
  • Network Links
  • UPSs
  • Backup Strategies
  • Cluster Services
  • Load Balancers