Mark VI VPRO Failure Caused Unit Trip

C

Thread Starter

Chris

Hi,

We recently had a unit trip (steam turbine/comp drive) due to the failure of a single VPRO card in <X>.

The control system is a TMR GE Mark VI. We are running UCVG processors with VCMI H2Bs (I believe) in each core. We recently experienced an unexpected trip of the unit. The problem was resolved and the unit was restarted after the <X> VPRO was changed with a new one. The following things were noted during the intial troubleshooting efforts:

All (3) processors (r,s,t) were found in the "Boot State". IO State 0x67 and Boot state 0xc3. Also was saying "Missing any IO NET Connections" or something (don't have my notes at home with me). Not sure what caused them to reboot??

All (3) processors (r,s,t) could be pinged sucessfully and response received.

No diagnostic alarms on UCVG. Only diagnostic alarm on VCMI (all three cores) was "using default data rack r8". Of course this is why we decided to change the <X> VPRO.

Toolbox was able to view diagnostic data for all vme cards excluding the <X> VPRO. <Y> and <Z> displayed no diagnostic alarms.

Mark VI HMI was not displaying any live data. Only showing #### signs. There were a few alarms active such as "125VDC ground", "AC source #1 lost", "AC source #2 lost", and "125VDC source lost". Not sure why these alarms were active, we have no indication that any power sources "blipped" and it seems even odder since the HMI was not communicating with the processors. There was another alarm indicating that the HMI link to the Mark VI was lost and cimplicity was not communicating.

Upon review of the processor diagnostics we found that "XMIT SUSPENDED. CPU SWITCHED" diagnostic alarm toggled a couple times before the unit tripped. We also received deviation alarms on a 2oo3 temperature trip that is executed in the <P> core. The trip log did not capture data since the Mark VI lost comm before L4 changed states.

Also, the input impediance of the IONET connector on the bad VPRO was approx 100mega ohm while it was approx 140 kilo ohm on the good VPROs.

In the past, we have successfully changed VPROs while the unit is running without affecting unit operation. Has anyone seen a similar issue before? Is it possible that the VPRO caused an IONET failure resulting in the reboot of <R> <S> <T> processors? They are supposed to be "completely" independent.

Thanks,
Chris
 
Dear Chris, I shall wait to hear what CSA and Otised have to say in this situation, but for my part I can say this.

I have had multiple failures of VPRO's with no unit trip.

Without having your .m6b file and interconnect drawings it is difficult to say why you saw the alarms for for AC and 125VDC sources lost.

My guess might be that something happened with the 125VDC source? Possibly the failure of "X" caused a problem with the 125VDC source? Pardon me if my thoughts are the same that you have already had, but it sounds like something caused a reboot of all the cores. The fact that all 3 cores, R,S, and T were stuck in the boot sequence tells me that they rebooted after "X" failed. It my experience that you need all cores, X,Y,Z and R,S, and T to be healthy for a successful boot to control after a loss of power or shutdown of all cores. The loss of data to the HMI's would be normal for me if the MKVI is not booted to control state. I can say that I expect alarms related to 125VDC voltage, ground etc. on my system if "R" core fails since it is usually the only core assigned to this function. The alarm "XMIT SUSPENDED CPU SWITCHED" is usually set when a core fails, as it transfers "control" or communication to the next core. The deviation of temperature values sounds again like when "X" core failed that there was a loss of data for values from "X".

As far as the impedance readings I can't comment.

Very interesting problem, I hope others might have other helpful comments.
 
Chris

I haven't experienced "multiple" VPRO failures, but have experienced a couple of them and have been able to replace the affected VPRO without shutting the (gas) turbine down, or tripping it.

I have noticed a marked difference in some of the configuration settings of various Mark VI cards and elements between gas- and steam turbine applications. Why? Because they can; there's no other reason. Some of it is due to inappropriate changes of setting by commissioning personnel, or field services personnel when troubleshooting. Some is due to inexperience and lack of review of newer, inexperienced requisition engineers. So, without being able to see the .m6b file, it's very difficult to say if there's something in the configuration that might be amiss or could be changed to positively affect the situation in the future.

As for the resistance, I would say if you have two values for "working" IONETS that are the same, and one for the "problematic" IONET that is different, then that could be a source of the problem. I think, on the whole, the values are high to begin with; I don't think Ethernet (even 10Base2) cable/connector resistances are normally that high, but I haven't search the World Wide Web for typical values before posting this reply, either.)

This is indeed odd. I wonder how the power monitor is connected to the Mark VI, and if the connections could be contributing to the problem. I have seen the <PDM> mounted on the door, with the sensor wiring passing over the door hinge get chafed and cause intermittent grounds. Also, I think the power sensor is "SIMPLEX", not TMR.

Some Mark VI power supplies have proven to be somewhat problematic. I believe GE has TILs (Technical Information Letters) out about some of them, and I've heard there are third-party vendors who rebuild and/or repair Mark VI power supplies to be better than OEM.

Other than that, this seems to be an aberration. It would be great to hear what you find out about 10Base2 Ethernet cable/connector resistances, and if that seems to have any effect on the operation of your turbine, though it seems you aren't experiencing any problems at this time.

Let us know what you find out, please!
 
G

ge_controls_galore

Thanks for the responses guys. We are performing some testing here in our hot spare panel (simplex unfortunately) and are planning to send the card back to GE for testing when we are finished. When we originally put the card in the hot spare rack and downloaded, the card was healthy. When I came in this morning, the card is now "out of sync" and the diagnostics cannot be accessed via toolbox. The <R> processor, however, is still in the controlling state and did not reboot.

As for the system architecture, the design is very robust. We have very knowledgeable control guys on our end and we have worked with GE in the past to make sure we do not have any single points of failure (except for treg/tpro/etc). We have changed this <X> core VPRO on the run before also. In addition, we powered down <X> once the machine was up and running to ensure there was not another issue. Machine operation was not affected.

I included the alarm list below. I believe the trip occurred at 14:17.<pre>
Site: H MACHINE
Unit: S1
Data Types: Process Alarms, SOE's
Time Type: Local Time
Start Time: 2013.08.04 16:44:33.000
End Time: 2013.08.05 16:44:33.000
Report: Exception

Time Unit S P Drop/Point Typ Description
------------------------ ------ -- ------------------------------- --- ------------------------------
04-AUG-2013 18:46:54.710 S1 1 Q 0076 ALM MHL701 MHC01 SEAL OIL RES MHTK01 LEVEL LOW
04-AUG-2013 18:46:59.209 S1 0 Q 0076 ALM MHL701 MHC01 SEAL OIL RES MHTK01 LEVEL LOW
05-AUG-2013 13:04:39.809 S1 1 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:04:39.909 S1 1 Q 0256 ALM <R> SLOT 1 VCMI DIAGNOSTIC ALARM
05-AUG-2013 13:04:39.909 S1 1 Q 0257 ALM <S> SLOT 1 VCMI DIAGNOSTIC ALARM
05-AUG-2013 13:04:39.909 S1 1 Q 0258 ALM <T> SLOT 1 VCMI DIAGNOSTIC ALARM
05-AUG-2013 13:04:39.927 S1 0 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:04:39.969 S1 1 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:04:40.087 S1 0 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:04:40.129 S1 1 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:04:40.248 S1 0 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:04:40.289 S1 1 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:11:35.024 S1 1 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 13:11:35.024 S1 0 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 13:11:35.043 S1 0 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:11:35.043 S1 1 Q 0256 ALM <R> SLOT 1 VCMI DIAGNOSTIC ALARM
05-AUG-2013 13:11:35.043 S1 1 Q 0257 ALM <S> SLOT 1 VCMI DIAGNOSTIC ALARM
05-AUG-2013 13:11:35.043 S1 1 Q 0258 ALM <T> SLOT 1 VCMI DIAGNOSTIC ALARM
05-AUG-2013 13:11:35.085 S1 1 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:11:35.203 S1 0 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:11:35.244 S1 1 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:11:35.364 S1 0 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 13:11:35.404 S1 1 Q 0184 ALM MHT700 MHT01 OVERHEAD TEMP DEVIATION
05-AUG-2013 14:07:27.665 S1 1 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 14:07:27.665 S1 0 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 14:12:51.102 S1 1 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 14:12:51.102 S1 0 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 14:17:00.480 S1 1 S0029 SOE S1\S0029
05-AUG-2013 14:17:00.480 S1 1 S0030 SOE S1\S0030
05-AUG-2013 14:17:00.480 S1 1 S0031 SOE S1\S0031
05-AUG-2013 14:17:00.480 S1 1 S0032 SOE S1\S0032
05-AUG-2013 14:17:00.480 S1 1 S0035 SOE S1\S0035
05-AUG-2013 14:17:00.480 S1 1 S0037 SOE S1\S0037
05-AUG-2013 14:17:00.480 S1 1 S0038 SOE S1\S0038
05-AUG-2013 14:17:00.480 S1 1 S0040 SOE S1\S0040
05-AUG-2013 14:17:01.269 S1 1 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 14:17:01.269 S1 0 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 14:17:01.269 S1 1 Q 0005 ALM L27AC_DC1A INCOMING AC #1 POWER LOST
05-AUG-2013 14:17:01.269 S1 1 Q 0006 ALM L27AC_DC2A INCOMING AC #2 POWER LOST
05-AUG-2013 14:17:01.269 S1 1 Q 0007 ALM L27BATTA INCOMING BATTERY POWER LOST
05-AUG-2013 14:17:01.269 S1 1 Q 0008 ALM L27DZA 125VDC UNDERVOLTAGE
05-AUG-2013 14:17:01.269 S1 1 Q 0026 ALM 125VDC GROUND
05-AUG-2013 14:17:01.348 S1 0 Q 0005 ALM L27AC_DC1A INCOMING AC #1 POWER LOST
05-AUG-2013 14:17:01.348 S1 0 Q 0006 ALM L27AC_DC2A INCOMING AC #2 POWER LOST
05-AUG-2013 14:17:01.348 S1 0 Q 0007 ALM L27BATTA INCOMING BATTERY POWER LOST
05-AUG-2013 14:55:06.366 S1 1 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 14:55:06.366 S1 0 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 15:00:29.785 S1 1 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 15:00:29.785 S1 0 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 15:04:34.231 S1 1 S0029 SOE S1\S0029
05-AUG-2013 15:04:34.231 S1 1 S0030 SOE S1\S0030
05-AUG-2013 15:04:34.231 S1 1 S0031 SOE S1\S0031
05-AUG-2013 15:04:34.231 S1 1 S0032 SOE S1\S0032
05-AUG-2013 15:04:34.231 S1 1 S0033 SOE S1\S0033
05-AUG-2013 15:04:34.231 S1 1 S0034 SOE S1\S0034
05-AUG-2013 15:04:34.231 S1 1 S0037 SOE S1\S0037
05-AUG-2013 15:04:34.231 S1 1 S0038 SOE S1\S0038
05-AUG-2013 15:04:34.231 S1 1 S0040 SOE S1\S0040
05-AUG-2013 15:04:39.961 S1 1 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 15:04:39.961 S1 0 Q 0000 ALM ALARM XMIT SUSPENDED. CPU SWITCHED.
05-AUG-2013 15:04:39.961 S1 1 Q 0005 ALM L27AC_DC1A INCOMING AC #1 POWER LOST
05-AUG-2013 15:04:39.961 S1 1 Q 0006 ALM L27AC_DC2A INCOMING AC #2 POWER LOST
05-AUG-2013 15:04:39.961 S1 1 Q 0007 ALM L27BATTA INCOMING BATTERY POWER LOST
05-AUG-2013 15:04:39.961 S1 1 Q 0008 ALM L27DZA 125VDC UNDERVOLTAGE
05-AUG-2013 15:04:39.961 S1 1 Q 0026 ALM 125VDC GROUND
05-AUG-2013 15:04:40.039 S1 0 Q 0005 ALM L27AC_DC1A INCOMING AC #1 POWER LOST
05-AUG-2013 15:04:40.039 S1 0 Q 0006 ALM L27AC_DC2A INCOMING AC #2 POWER LOST
05-AUG-2013 15:04:40.039 S1 0 Q 0007 ALM L27BATTA INCOMING BATTERY POWER LOST</pre>
Thanks for you guys assistance!! We are scratching our heads right now.
 
Dear Chris, or GE Controls Galore, think you made a name change?
From reviewing the alarm report I have these questions and comments?

What happened at 1304 that caused the VCMI diagnostics for all cores?

What happened at 1311 that caused the alarm xmit failure, temperature deviation, and VCMI diagnostics?

What happened at 1407 and 1412 that caused more alm xmit failures?

It seems like something was going on prior to the trip at 1417 that was trying to warn you of an impending problem.

It would also be helpful to see your .m6b file to understand what generates the SOE's seen prior to the trip, and also to understand what generates the alarms for incoming power lost alarms. As well being able to see interconnect drawings that show power feeds and configuration. As CSA says there is a lot of flexibility in how these systems are configured, especially from gas turbine to steam turbine applications. This flexibility along with experience levels of field and requisition engineers makes it difficult to completely understand or predict how a system will react during a given situation or failure.

I look forward to any more information you can provide.
 
Top