Software and Critical/Safety Control Systems


Thread Starter


I'm curious where I could read up on or discuss the topic of appropriate operating system selection for certain safety control systems.

FWIW I am not in a position to select control system OS's. I'm a humble observer (and mechanical engineer) for a company that makes safety control systems.

Assuming you have no legacy code or design to deal with; in critical safety control systems how would you select an appropriate OS?

Examples: Nuclear power plants, modern civil and military aircraft, GPS positioning systems for drillships, well blowout control systems.

System stability seems important and revision control seems important. Everything else seems to be derivative.

How do you quality such a system as safe? Redundant? Safe enough to keep thousands of people out of harms way, etc.

I'm curious what the community thinks. Maybe the right question is to ask, what should you avoid at all cost?

curt wuollet

There is immediately a big problem. There is essentially, no choice, with any of the major vendors. But, hypothetically, for true critical safety systems, you can lean on authorities that have audited and certified OS's for critical applications. Like the FAA and other government bodies. Or you can check out the credentials of those who sell this type of system. And you can do your own logical selection. Logically, it should be as small and narrow in scope as it can and still accomplish what you need. And it should be Open Source, so that you can audit and verify exactly what any particular function does. It should have established reliability, verified by uninterested third parties. And your decisions should then be defensible. Or, you can simply go for Windows, it meets none of those criteria, but it sure is popular and even your boss would be happy because he can play golf on it.

I haven't gone through the process, but I have looked at what is involved. What you have to look for is what the applicable regulations are for the particular industry. Then you have to look at what the industry practice is when it comes to meeting those regulations. The legislation will state a goal, but the application of it will depend on compromises worked out between the relevant industry and the government regulators.

For aircraft you have to look at aircraft specific regulations, and those will vary from country to country (although usually not that much). For ships you have to look at marine regulations. In addition, there may be different regulations which apply to cargo aircraft or ships versus passenger aircraft of ships.

When it comes to complex operating systems, the bar seems to be set fairly low (judging by some of what gets used). However, it has to go through some sort of certification process, which is usually very time consuming and expensive. The certification process isn't really intended to fix any problems, it's just intended to make sure that nobody can be held to blame when something inevitably does go wrong.

Because of the cost of certification and the limited size of the market, the software that *does* get certified isn't necessarily the "best". Rather, it's a matter of whether some particular vendor was interested enough in that niche market to spend the time and money going through the certification process.

In many cases, the stuff that is *really* safety critical doesn't use an OS. Instead it uses a lot of custom code and some third party libraries. An OS isn't really necessary, because the functions are deliberately limited to what can reasonably be tested. Real safety doesn't come from any one component. It comes from the overall system design where you have to expect things to fail and allow for that in your design.

However, there is a very broad middle ground between critical and non-critical, and that's where a lot of things can go wrong unexpectedly. Something that wasn't supposed to be safety critical becomes safety critical because people rely on it to give them correct information before they decide whether or not a problem really exists. Or sometimes the non-safety critical component interferes with the safety critical ones, and prevents them from functioning correctly.

You mentioned two examples that made me laugh, because they show what shaky foundations a lot of systems are built on. You mentioned "well blowout control systems". I suppose you were thinking of the company that BP hired to drill their well for them in the US? There were news reports recently that the well monitoring software system was suffering constant "Blue Screen Of Death" problems. According to the news, the OS used there was MS Windows NT 4. That version of Windows was notorious for falling over it's own feet all the time, so I can't imagine that reliability was even a consideration when someone selected that to base their monitoring system on.

You also mentioned nuclear power plants. There is a power plant in Ohio in the US which installed a "safety monitoring system" that used an ordinary SCADA system running on MS Windows. It wasn't part of the safety controls, but it recorded how those systems were functioning in order to keep records for historical analysis. It eventually got a virus, as was no doubt bound to happen (I think it was the SQL Slammer worm). The virus was scanning the network so heavily while looking for new victims that the actual safety systems were unable to function and the reactor control system was "out of control" for about 8 hours while the operators were battling to regain control of it. Fortunately, the reactor was already shut down at the time for other reasons. This is a good example however of how something that was supposed to be "non-critical" had dangerous side effects on the critical controls.

You asked "what should you avoid at all cost" - well, I would definitely avoid using Microsoft Windows. And actually Microsoft says you are not supposed to use their products for anything critical either. Their stuff just isn't designed for it and it never will be because it's not a market they have any interest in.
Hi there,

The answer to your question is actually quite simple and that is that you first need to understand the safety priorities of the companies that will be using the safety and ESD systems.

Any company's highest priority is the safety of their workforce during a emergency situation and then only the safe guarding of their equipment.

There are too many things involved to explain to you how this is done but basically risk assessments and HAZOP studies are done to create various hypothetical scenarios to see how the safety system will react and how it will keep the people safe and safe guard the equipment.

One good practical implementation of these studies can be seen in the C&E matrix as well as areas where a voting system have been implemented. Like I said there are to many things that come into play during safety and emergency situation studies to mention here, but if you want to test a safety system or part of the system, ask the question, how will it keep the people and equipment safe in a situation like this or that? If you can come up with a hypothetical scenario that the safety system will not be able to cope with, other safety precautions, permit to work systems and procedures can be implemented for those situations as well and these paperwork and procedures also form part of the safety system as a whole, so keep that in mind as well. It is not just all field instrumentation, control systems, computers and software.

So basically the operation system selection you mentioned in your question is only a small part of the overall process of designing a safety system and the same questions will be asked and scenarios will be created to see how stable and reliable a operating system will be under various emergency conditions.

James Ingraham

Look for people who know what they're doing with it. Like for, example, Green Hills Software and their Integrity RTOS, which has DO-178B Level A certification packages.

Or QNX, which has IEC 61508 SIL 3 certification.

> How do you quality [sic] such a system as safe? Redundant? Safe enough to keep thousands of people out of harms way, etc. <

I don't know. I don't actually have to do that. But it sure is important.

> I'm curious what the community thinks. Maybe the right question is to ask, what should you avoid at all cost? <

Avoid roll-your-own whenever possible. Avoid the use of global variables. (Both of these are showcased in the famous Therac-25 radiation overdose case, Always use an appropriate tool for the job. If you need a safety-stamped OS, you can't use Linux, Windows, or even VxWorks. Build on the work of others, e.g. MISRA C, JFS C++ guidelines, etc.

Just my $0.02. I hope nobody goes out and builds a nuclear power plant based on what I say.

-James Ingraham
Sage Automation, Inc.

curt wuollet

Yes, when dealing with something as regulated as this, with the liabilities involved, you can go a long ways by studying the outfits that have spent all the money and jumped through all the hoops. Err, that's if _their_ stuff works. But I do see some companies that should know better doing some things that I think are insane, but it must work, or they must have a massive legal budget. Or, I suppose they just fold up their tents if something happens.

trusting government bodies didn't seem to help with Three Mile Island....or Deepwater Horizon.

I take some issue with the open source suggestion. Yes you could audit every function and trace how it works...or you could write it yourself. To be qualified to do one would qualify you to do the then you'll also have some security (IP reasons) and revision control of your code.

curt wuollet

No, the government agencies don't make it safer. No other regulatory entity does either. But they can make it legal or at least defensible.

And if you feel safer with code that nobody can audit, I've got a used car for you, sight unseen. By the time you get down to something that anyone will certify, your choices for a particular application will be quite limited. You might well _have_ to write your own. If it were my Iron Lung, it would be OSS or mechanical.

Perhaps this just shows how little I know about OSS.... If anyone can audit, can anyone hack it? Can an competitor leech something important about your chemical process control or saboteur easily insert a malicious bug? Obfuscation is one benefit to custom/proprietary code.
Let's say a "plumber" gets access to a terminal in your large plant and knows what you're running.

There's no perfect defense. This is especially true of physical security. Then what?

curt wuollet

Well, first of all, if you don't have physical security, you have very little security whether the system is open or closed. But we are talking about safety and reliability, not security which is a whole nother animal.

Anyone in the world can read the entire source to the OS running on this machine. It's reliability is excellent, uptime is 100%, excluding when I take it down to blow it out. It may or may not be up to safety or critical standards depending on the need. How does the fact that it's all OSS make it any less reliable? In fact, the leading closed competitor has a rather dismal, but improving record on reliability. It's almost usable for non-critical tasks now. And obviously closing the source for Windows has not made it more secure.

But you are also confusing the OS with the application. The application is generally quite specific in critical/safety situations. It too should be auditable or at least reviewed. They are
seldom published, but having the source makes it auditable. And if it is well written, publishing the source should not make it less safe or more vulnerable. It is unlikely to be kept on the production system for the plumber to find. Your questions relate to how you protect your systems, rather than if the OS is safe and reliable. But in general, No to all.

In reply to anon, if you have been following the news lately, you would see that there is a virus which is going around which is targeted at (Siemens) SCADA systems, and which is hoovering up data from databases for reasons which no one is sure of at this point. The application software, database, and operating system are all closed source proprietary products (Siemens and Microsoft).

To go back to your examples, how would this competitor get into your computer systems? Probably not by studying your source code. And if he could get in (perhaps your password was "1234"), I'm pretty sure that he isn't going to need the source code to do whatever he wants. After all, thousands of hackers are doing whatever they want with close source systems right now. The easiest thing to do would be to use an off the shelf virus, and I don't think I have to tell you where you will find lots of those.

Banks, stock markets, and major Internet infrastructure systems run on Linux or BSD with relatively few problems despite the huge amounts of money and bandwidth they represent. Tens of thousands of viruses take over MS Windows systems every single day despite the negligible reward to be gained. Look at the record. Look at the *facts*. Lack of source code has really been no hindrance to people designing viruses for closed source systems so far, has it?

A secure system does not depend on it being so obscure that few people know how it works. A genuinely secure system is secure even if all the details of it are public *except* your password. If you read any commentary on security, you will find a maxim that is repeated over and over again: "security by obscurity does not work".