Mark VIe IO Pack Failures

Hi Folks

I'm very new to Mark Vie control system, and trying to study from manual and any materials. Howver, some urgent issue need your experienced to help.

I found that my gas turbine was tripped with one IO pack failure (PAIC on IONet 3)
However, this card was configured as TMR, others 2 cards are still healthy.

Alarm log indicate the set of “Inputs unhealthy on IO Module XX (All pack that connect to IONet3), T pack IONet 3 - Message Timeout [Alarm ID 1541, UCSA-0, Controller S], which seem like controller lost communication with all pack on IO pack 3.

As further check,
1. Our power supply is normal, no any ground fault or loss of power supply problem.
2. The network switches are normal, actually out network switch has been replace fron N-Tron switch regarding to TIL 1939-R1 Improved Mark VIe Network Switch Availability for 4 years ago already.

My wondering is
1. If the 1oo3 of TMR IO pack failed can it cause the unit tripped ? As my understanding that the others two should still handle the system.
2. Can the one of IO pack failure can cause the set of “Inputs unhealthy alarm" on all IO pack at the same IO net like we found. In my view it seem reasonable when network fail (from cable, switch, etc.) only, not the one IO pack fail.
3. Our MKVie have some module that is simplex IO like PRTD with single IONet , I'm not sure that if simplex IO module failured will cause unit s/d or not, or this depend on the configuration in the software. From our system, the input connect to this pack is just for alarm only, not relate to any safety function, protection function or any control sequence.
4. Finally, we faced the high failure rate of IO pack increasingly, my unit installled since 2009 which now around 12 years. I think it was not too old (but not to young also.) However, failure rate seem very high when compared at the same lifetime with other brands of control system. Not sure you guys face this problem also.

Looking forward to your replied and thanks for your help
 
Dear NTH_P,
I am reading your post and trying to make sure I understand your situation, some of your statements are not perfectly clear. If you were to upload a copy of your alarm log that may assist us. My experience has been that IF you have an analog input card configured with 3 I/O packs in a TMR setup, then failure of a single I/O pack should not cause a unit to trip.

Alarm log indicate the set of “Inputs unhealthy on IO Module XX (All pack that connect to IONet3), T pack IONet 3 - Message Timeout [Alarm ID 1541, UCSA-0, Controller S], which seem like controller lost communication with all pack on IO pack 3.

Reading your statement above, I/O net 3 should be for "T" core.
Message Timeout [Alarm ID 1541, UCSA-0, Controller S] The alarm here indicates "S" core which should be I/O net 2

My wondering is
1. If the 1oo3 of TMR IO pack failed can it cause the unit tripped ? As my understanding that the others two should still handle the system. If the system is configured correctly then this should be true.

2. Can the one of IO pack failure can cause the set of “Inputs unhealthy alarm" on all IO pack at the same IO net like we found. In my view it seem reasonable when network fail (from cable, switch, etc.) only, not the one IO pack fail. A failure of a single I/O pack should not cause a loss of communication on the entire I/O net.

3. Our MKVie have some module that is simplex IO like PRTD with single IONet , I'm not sure that if simplex IO module failured will cause unit s/d or not, or this depend on the configuration in the software. From our system, the input connect to this pack is just for alarm only, not relate to any safety function, protection function or any control sequence. If properly configured inputs that are simplex should be non-critical or redundant and connected to other I/O nets. RTD's and thermocouples are a special case when it comes to I/O pack redundancy.

4. Finally, we faced the high failure rate of IO pack increasingly, my unit installled since 2009 which now around 12 years. I think it was not too old (but not to young also.) However, failure rate seem very high when compared at the same lifetime with other brands of control system. Not sure you guys face this problem also. If your system was installed in 2009 it is an early system with some older revisions of I/O packs etc. You did not say what type of unit you are operating and if any of the I/O packs are installed in cabinets that are outside. I think GE may have overstated the thermal limits for the I/O packs. I would suggest that the cooler you can keep your control compartments the longer your system will remain reliable. I might also suggest sending your failing I/O packs to someone like Powergenics for a root cause analysis. If they can identify a similar failing component on each I/O pack that might help you determine a path forward for keeping your system healthy.

Please give us more information on what type of turbine you operate.
 
NTH_P,

SPECIFICALLY, WHAT WAS THE PROCESS ALARM THAT WAS ANNUNCIATED WHEN THE TURBINE TRIPPED?

WHAT I/O (Inputs/Outputs) ARE CONNECTED TO THE THE I/O TERMINAL BOARD THE PAIC WAS ON?

1. While it's true that a single Diagnostic Alarm will not cause a unit to trip (in a TMR configuration), there are combinations of Diagnostic Alarms which WILL result in a turbine trip. "Which combinations???!!?!?!" is always the response, with fear and trepidation in the voice. There are literally thousands of possible Diagnostic Alarms in a Mark VIe turbine control system. It is impossible to delineate the precise combinations of Diagnostic Alarms which will result in a unit trip.

The BEST thing one can do to prevent a combination of Diagnostic Alarms from resulting in a unit trip is to resolve Diagnostic Alarms as they are annunciated, or as soon as possible after they are annunciated. Left unresolved, the number of Diagnostic Alarms will simply continue to increase, and after some time it's pretty inevitable that some combination of them is going to cause problems--either with power output or operation or result in a trip. By resolving Diagnostic Alarms quickly and keeping the number of them to an absolute minimum (it IS possible!!!) the likelihood of problems is GREATLY reduced.

2. You really need to provide the exact Diagnostic Alarm Message that was annunciated, and it would be even better if you provided ALL of the Diagnostic Alarms which were active at the time of the problem (I'll bet virtually all of the Diag. Alarms which were active at the time of the problem are STILL active, so that shouldn't be an impediment to providing a list of active Diag. Alarms...). If one of three I/O Packs on an I/O terminal board experience a problem, that should not result in a problem or trip--but there's always that caveat: Combinations of Diagnostic Alarms can cause problems/trips. For example, let's say the unit had three P2 pressure transmitters (96FG-2x) connected to the I/O terminal board, and 96FG-2B was already failed or failing (there would be Diagnostic Alarms to that effect!!!) and then IONET communications with <T>'s I/O PACK were lost, meaning there was no value of P2 pressure from <T>. That would most likely cause a turbine trip. Unresolved Diagnostic Alarms, large numbers of unresolved Diagnostic Alarms, are a recipe for unexplained problems of all kinds--including trips (eventually).

3. The "standard" is that nothing that would trip the turbine should be connected to a SIMPLEX module, so that a loss of a SIMPLEX I/O Pack will not result in a trip of the turbine. In the GE heavy duty gas turbine controls philosophy I/O is "roughly" divided into critical I/O and non-critical I/O. Critical I/O is that I/O which is necessary to operate and protect the unit; if it's really critical then there will usually be redundant sensors or outputs (think Trip Oil pressure switches; or servo-valve outputs; or LVDTs). So, it seems your system was configured properly--nothing that would trip the unit is connected to a SIMPLEX I/O pack.

However, some commissioning and field personnel make unrecommended changes, most often demanded by Customer management, that sometimes require a single input or output which, if not "available" to the application code running in the Mark* can result in a turbine trip.

In the case of RTDs, the way RTDs work it's not possible to have triple redundant (or even DUAL redundant) power sources constantly providing the power source for the RTDs (current or voltage). (That's one reason GE heavy duty gas turbine controls philosophy uses a lot of thermocouples and seismic vibration sensors--they are "self-powered" (meaning they generate their own signals without an external power source).) In general (and there are always violations of this "standard) RTD inputs are NOT used to trip the turbine; shutdown (normal, fired shutdown) maybe. But, again--there are some Customers who just KNOW BETTER than GE (who has been producing industrial, heavy duty gas turbines for more than 7 decades) how to protect their machines and make them as reliable as possible. (Usually, they are trying to protect the machines from damage caused by inattentive and poorly trained operators--preferring automation (the turbine control system) to protect the machine.)

4. Heat, dust (especially certain kinds of dust), and humidity are the enemy of all electronics. GE's estimates of the maximum allowable ambient temperature that Mark VIe I/O packs (and processors) can be operated in was--and still is--WAY TOO OPTIMISTIC. AND, the heat generated by I/O packs themselves is pretty high. (Want a quick example? If the I/O packs in the turbine control panel are arranged in vertical columns, go to a vertical column of PDIO (discrete input) I/O packs. Put your hand over or near the top of a PDIO at the bottom of the column and then start moving your hand upwards along the column of PDIOs and note how the heat increases as you move your hand higher. At the top, the heat is pretty "high" (as in magnitude, not position). The GE factory in Salem, VA, where the Mark VIe was conceived and designed even had a "standard" which was issued when the product was first being released and applied which was that I/O Packs SHOULD NOT be placed in vertical columns. (They routinely violated this standard, but, these things happen...))

Now, put vertical columns of I/O Packs in a turbine control panel in a hot and dusty environment with no fan to circulate air in the turbine control panel. Usually, there are air conditioners in the rooms where the equipment is located, and quite often, those fail and are not promptly replaced....

Another problem is the setting of the A/C units in the rooms where the Mark VIe is located. People often set the temperature VERY LOW--for their personal comfort, especially in very hot places in the world. If the site is also located in a humid area, then as soon as compartment doors are opened humidity quickly begins to form on the electronics--which are TOO COLD. Combine that with dust and dirt and that's about the worst way to treat ANY electronic equipment.

The A/C units in control compartments/room where Mark VIe equipment is located are primarily for humidity control, secondarily for temperature control. Yes; the two are related. But, properly adjusted, the temperature in the compartment/room can be maintained to reduce humidity in the room while keeping the electronics and electrical components cool, and dry.

But, again--the single most common reason for I/O Pack failures (especially repeated I/O Pack failures in the same location) is heat, and dust/dirt and humidity. And, in my personal opinion, lack of air circulation in the Mark VIe turbine control panel enclosure. Even a small amount of air continually being blown in at the bottom or, probably better, drawn out at the top is very helpful in keeping I/O Packs cool. HOWEVER, if the local environment is dusty/dirty blowing or drawing dusty/dirty air through the turbine control panel is NOT good, either. A happy balance must be achieved--air flow and clean air flow. The compartment/room must be kept clean, especially the floor (and especially if the site environment is dusty/dirty). And suitable filters (not just ANY filter--because ALL filters restrict air flow to a certain extent) must be used, and changed when dirty, to prevent circulating dusty/dirty air around and through the I/O Packs.

And then there's the matter of temperature and humidity in the compartment/room. Too cold is NOT good. Too cold in a humid location is worse. Too hot is not good. A/C units must be properly adjusted--and they must be properly maintained, and promptly replaced if they fail or do not work.

Solve the heat (and dust/dirt) problem and I guarantee you most of your premature I/O Pack failures will quietly go away.

Hope this helps!
 
I believe that a single PAIC T Pack failure will NOT cause the unit to trip. According to your description, this trip seems to be caused by the entire T IONET communication failure. This may be cause by a UCSA <T> controller or a T I/O switch or a PS power supply module faulty, but anyway, this is still an abnormal unit trip, because during the commissioning of the new unit, the TMR redundancy test have been done to ensure that any single controller, power supply, and IO network failure will not tripp the unit.

A single IO Pack of PAIC even a single PDIO, PDIA configured by TMR will not cause the unit to trip, if the T pack failure, the R and S pack will continue to work without any problems and will not cause any disturbance to the unit.

However, there are some TMR-configured control devices, such as the output adjustment current of the servo valve on the PSVO/PCCA card, which will have a very short fluctuation, which will cause some adjustment disturbances. This is because the output current of the servo valve is the sum of the currents of the three coils, not the voting output.

For example, if the current of the T coil of the servo valve is lost due to a T communication failure, the total output current of the servo valve will decrease for a very short time, because R and S will immediately increase the current output of their coils to compensate or maintain The total output current, but the adjustment of the servo valve at this time has a brief fluctuation, which makes the position of some control valves fluctuate or oscillate, such as IGV, SRV and GCVs, especially when these valves are being adjusted. If the adjustment of the servo valve is unstable, it may cause some other problems, such as excessive combustion dynamic pressure, which may cause the unit to trip.

When the unit trips, you should immediately find the first trip signal in the short-term historical alarm, first find L4T=1 (Ture) or L4=0 (false), and then continue to look for the first trip signal before L4T, this trip Signals generally have names like L86xxx/L4xxT/L5/L94xxT. If you can find this signal, you will be able to understand the root cause of the trip.

If the unit trips, you should immediately export the short-term historical alarms, because its generally configured to save only 50,000 records, the earlier alarm records will be replaced by new alarms/events/SOEs after a period of time. In addition, you can also use the trip log to analyze the cause of the trip . if the short-term hist alarm has expired, you can also can find the process alarm/diag alarm/event/SOE list include the triplog trend file, any unit trip will generate three trip trend files with same time, marked with R, S and T. Especially for the unit trip caused by single controller/IONET/PS, you need to compare and check the three trend data/alarm to find out the true cause of the trip.
 
Top