News

New Family of Fault-tolerant Computing Platforms Delivers Seven Nines Availability

November 15, 2023 by Seth Price

Stratus has released its most fault-tolerant line of industrial computers yet, designed for maximum system availability.

Stratus Technologies has unveiled its latest form of fault-tolerant edge computers. The ztC Endurance platform culminates over four decades of resilient, reliable computing development. Initial testing and implementation of the ztC Endurance computers show 99.99999%, or “seven nines” system availability, meaning they are designed to run despite almost any fault.

 

A fault-tolerant system will continue operating despite experiencing errors in one or more parts. Image used courtesy of Adobe Stock

 

What Is Fault Tolerance?

Fault tolerance is the ability of a computing platform to maintain its composure, regardless of what happens to it. In the old days of Windows 95 and 98, everyone dreaded the “Blue Screen of Death (BSOD),” which meant a low-level fault had caused the computer to stop functioning. A more fault-tolerant system would examine this error and determine what needs to happen to keep the system functioning.

While a BSOD was a nuisance for a home computer, it could spell certain disasters in the manufacturing environment. A hard stop, such as the BSOD, interrupted communication between devices, lost data, and could leave machines in an unknown state. As an example, I used to have an X-ray diffractometer that, after a hard crash, would pull a random number for the detector angle. If this mistake was not caught in time, the next time the software was executed, instead of positioning the detector from 0-90 degrees along a track, it could have attempted to go to -6348 degrees, physically crashing it into the cabinet.

Computers are now built to be fault-tolerant to limit these hard crashes. Fault tolerance can be achieved by using redundant systems in the chipset so that if one thread has a fault, other threads can pick up the slack and investigate and reset the faulty one. Besides redundancy, fault tolerance can take the form of load balancing and distributed systems, where processing occurs in multiple locations to ensure that one single processor is not overloaded. 


Stratus ztC Endurance edge computing platform for fault tolerant system availability

Stratus Technologies’ ztC Endurance computing platforms offer predictive fault tolerance to deliver high system availability. Image used courtesy of  Stratus Technologies

 

Stratus ztC Endurance

Stratus intends for the ztC Endurance to be one of the most reliable computing platforms on the market. With seven nines availability, the fault-tolerant computing platforms are designed to tolerate all the problems that industry can throw their way, from small power bumps to missed communication handshakes, without a system crash.

At the heart of the ztC is a 4th-generation Intel Scalable Sapphire Rapids processor and high-performance DDR5 memory. The ztC Endurance is fully scalable and modular, meaning components can be replaced or upgraded as needed. Three sizes are available for small, medium, and large plants (3100, 5100, and 7100, respectively). The ztC can also run third-party cybersecurity protection without bogging down the system—a distinct advantage over many competing systems.

 

Fault-tolerant Computing

While the X-ray diffractometer example given is rare, thanks to advances in fault tolerance and machine controls, hard crashes can result in machine unpredictability. In the case of the X-ray diffractometer, the detector could have been damaged, but its movement was limited to an interlocked box, meaning the chances of injury were almost zero. 

In industry, unpredictable operation of machinery could lead to injury or death should a machine lose its coordinate system or communication with sensors, limit switches, or other such components. At first glance, it is tempting to think of “seven nines” as a quality and efficiency metric, but it is important also to consider safety as another advantage to system fault tolerance.