MK-Vie controller reboot fail problem

We operate GE 7FA gas turbines, and our main control system is Mark VIe with UCSB main controllers configured in a triple-redundant R/S/T setup.

Recently, we replaced a TSVC TMR terminal board due to a servo/RVDT deviation issue. The replacement was performed strictly according to the GEH-6721 procedure, including proper Varcode entry and a complete Build & Download.

However, immediately after the download finished, the S-controller attempted to reboot but failed to start up. The UCSB BOOT LED was flashing every 3 seconds, and the signal means :
“Baseload signature verification has failed.”

We also confirmed that the controller was not reachable via network ping.

Using the USB backup and recovery procedure, we were able to restore the S-controller, and it eventually resynchronized with R and T.

My question is:
What could cause this kind of single-controller reboot failure and signature verification fault after an otherwise normal I/O terminal board replacement?
We have seen this happen occasionally, but not consistently, and we would like to understand the underlying mechanism or typical root causes.

Any insight from others who have experienced similar Mark VIe UCSB behavior would be greatly appreciated.
 
Did you tried reflashing from USB pen drive ? Most of the times, this will help in recovery.
If its not coming online even after reflashing then recommended to get in touch with the OEM for Failure Analysis.

By the way, you did not mention what is your Mark VIe runtime version currently you are using.
OEM has recommended to use latest version of it.
 
Thank you for the responses.
The USB recovery has already been completed, and the controller is now operating normally.
We also keep the Mark VIe runtime version up to date through regular system updates.

However, what I am trying to understand is not the recovery method, but rather why this issue occurs intermittently in the first place.
I would like to know the underlying reason or mechanism that causes a single UCSB controller to fail with signature verification errors after an otherwise normal Build & Download procedure.

If anyone has insight into why this happens periodically on Mark VIe systems, I would appreciate your input.
 
@HyeongJun,

How many Mark* VIe turbine control panels and GE excitation systems are on the same UDH network?

If more than on Mark* VIe and excitation system, were any of the other machines and control systems running (producing power--MW) at the time you were trying to download and reboot after a Build & Download?

Did you observe any warnings during or after the Build & Download and subsequent reboot of the UCSB processor?

In my experience a high amount of UDH network traffic can cause issues--especially if one or more machines are running and there is a lot of network traffic either for ADH purposes or Historian purposes. And a large number of dithering Process Alarms and/or Diagnostic alarms can also cause issues with downloads and reboots. Dithering alarms (Process and/of Diagnostic) causes a LOT of UDH network traffic. Proper Alarm Management and resolution can significantly reduce or eliminate dithering alarms.
 
Thank you for your reply.

We have 4 GTs, 4 excitation systems, 2 LCIs, and several other devices all connected to the same UDH network.
During the issue, the two GTs we were working on were offline, but the other two GTs were running at more than 120 MW.

During the Build & Download process, we did not observe any specific warnings.

Compared to a few years ago, our system now includes additional logic, as well as more Historian and alarm-related functions. I can see how these might contribute to higher UDH traffic and potential issues.

Is there a recommended way to check or monitor the UDH network loading or traffic level?
Any guidance or tools for diagnosing UDH network performance would be very helpful.
 
@HyeongJun,

The biggest I have seen--repeatedly--with UDH network traffic is poorly/improperly configured MODBUS lists, with misspelled signal names and non-existent signal names. ALSO, an excessive number of configured MODBUS signals can cause UDH issues; simply adding nearly every signal to MODBUS lists is not a good idea. And some MODBUS masters simply aren't capable of high numbers of signals and when there are issues with signal names it can even cause more issues for MODBUS communications. Now this was when the MODBUS client(s) were configured on the HMI(s), and if the Mark* VIe uses the serial communication modules for MODBUS communications that decreases the UDH network traffic somewhat.

In my experience, there are times when there are issues with downloading after a Build when EVERYTHING is selected to be downloaded at the same time (app code; configuration; etc.)--and there are times when no errors or warnings are detected. It just happens--and it usually happens when there are multiple machines on the UDH. Many people have had to resort to reflashing the flash memory cards, and it has even happened that something really serious happens to the UCSB that requires replacement (though those times are rare, they do happen).

I've also seen issues with poor app code changes (usually with signal name additions and alarm additions (HINT; HINT)) that weren't flagged during Building. Sometimes, simply reBuilding fixes things, and sometimes one has to go back and go through all the changes that were made (and this is when it's REALLY IMPORTANT to have a detailed list of any and ALL changes made before Building!) checking to see that everything was done properly.

The Mark* VIe is a fine turbine control system--probably one of the best ever made for any machine. BUT, as with everything these days it is getting more and more complicated and functionality keeps getting getting added and modified and things which worked fine just stop working (and GE is woefully unable to help with these kinds of issues) and when trying to implement new functionality which wasn't originally used other things/functions just stop working. It's kind of the nature of technology that as they become more capable they also seem to become a little more unreliable in some cases. (I recall a time when iPhone software was really buggy and a lot of users/owners opted for Android phones--which weren't really ready for prime time usage and people ended up going back to iPhones years later because they got tired of the lack of updates and buggy Android apps. Unfortunately, in this case one can't go to a bodega and get a new turbine control system with a different OS, but the point remains: There are times in the product life of most things these days where there is a stretch of less than stellar updates that are released.)

Anyway, I don't know of any guidance or tools for diagnosing UDH network performance, short of WireShark for capturing Ethernet network traffic and then getting GE to review and analyze the data. Perhaps @Swami can help with suggestions or tell us of ToolboxST tools for monitoring/capturing UDH network traffic. Alarms and Events are given very high priority on the UDH; I don't know that kind of priority Downloading has when trying to configure/modify app code/configuration.

And, it hasn't gone unnoticed that you did not respond to the question about dithering alarms (from running and off-line machines!)--and they can and have caused problems with UDH operation and reliability. I don't really expect that GE had done or does a lot of testing while Downloading with multiple Mark* VIe panels, exciters and LCIs all communicating on the UDH.... I suspect the majority of testing has been and was done with a single HMI and a single Mark* VIe on the UDH, with no field devices connected to the Mark* VIe. (I know for a fact this was how it was done in the early days of Mark* VIe...)

Best of luck, and if you discover anything of use we would love to know!
 
Thank you for the responses.
The USB recovery has already been completed, and the controller is now operating normally.
We also keep the Mark VIe runtime version up to date through regular system updates.

However, what I am trying to understand is not the recovery method, but rather why this issue occurs intermittently in the first place.
I would like to know the underlying reason or mechanism that causes a single UCSB controller to fail with signature verification errors after an otherwise normal Build & Download procedure.

If anyone has insight into why this happens periodically on Mark VIe systems, I would appreciate your input.
The symptoms you mentioned is very closely matching with "Enhancement in ECC correction" TIL they released few years back (I dont remember the actual number).
Do mention the Mark VIe runtime version you currently have.

If you want to know the root cause of it, then you must collect the full controller's advanced diagnostics logs before and after the event and share with OEM for further advice.
 
@HyeongJun,

The biggest I have seen--repeatedly--with UDH network traffic is poorly/improperly configured MODBUS lists, with misspelled signal names and non-existent signal names. ALSO, an excessive number of configured MODBUS signals can cause UDH issues; simply adding nearly every signal to MODBUS lists is not a good idea. And some MODBUS masters simply aren't capable of high numbers of signals and when there are issues with signal names it can even cause more issues for MODBUS communications. Now this was when the MODBUS client(s) were configured on the HMI(s), and if the Mark* VIe uses the serial communication modules for MODBUS communications that decreases the UDH network traffic somewhat.

In my experience, there are times when there are issues with downloading after a Build when EVERYTHING is selected to be downloaded at the same time (app code; configuration; etc.)--and there are times when no errors or warnings are detected. It just happens--and it usually happens when there are multiple machines on the UDH. Many people have had to resort to reflashing the flash memory cards, and it has even happened that something really serious happens to the UCSB that requires replacement (though those times are rare, they do happen).

I've also seen issues with poor app code changes (usually with signal name additions and alarm additions (HINT; HINT)) that weren't flagged during Building. Sometimes, simply reBuilding fixes things, and sometimes one has to go back and go through all the changes that were made (and this is when it's REALLY IMPORTANT to have a detailed list of any and ALL changes made before Building!) checking to see that everything was done properly.

The Mark* VIe is a fine turbine control system--probably one of the best ever made for any machine. BUT, as with everything these days it is getting more and more complicated and functionality keeps getting getting added and modified and things which worked fine just stop working (and GE is woefully unable to help with these kinds of issues) and when trying to implement new functionality which wasn't originally used other things/functions just stop working. It's kind of the nature of technology that as they become more capable they also seem to become a little more unreliable in some cases. (I recall a time when iPhone software was really buggy and a lot of users/owners opted for Android phones--which weren't really ready for prime time usage and people ended up going back to iPhones years later because they got tired of the lack of updates and buggy Android apps. Unfortunately, in this case one can't go to a bodega and get a new turbine control system with a different OS, but the point remains: There are times in the product life of most things these days where there is a stretch of less than stellar updates that are released.)

Anyway, I don't know of any guidance or tools for diagnosing UDH network performance, short of WireShark for capturing Ethernet network traffic and then getting GE to review and analyze the data. Perhaps @Swami can help with suggestions or tell us of ToolboxST tools for monitoring/capturing UDH network traffic. Alarms and Events are given very high priority on the UDH; I don't know that kind of priority Downloading has when trying to configure/modify app code/configuration.

And, it hasn't gone unnoticed that you did not respond to the question about dithering alarms (from running and off-line machines!)--and they can and have caused problems with UDH operation and reliability. I don't really expect that GE had done or does a lot of testing while Downloading with multiple Mark* VIe panels, exciters and LCIs all communicating on the UDH.... I suspect the majority of testing has been and was done with a single HMI and a single Mark* VIe on the UDH, with no field devices connected to the Mark* VIe. (I know for a fact this was how it was done in the early days of Mark* VIe...)

Best of luck, and if you discover anything of use we would love to know!
Modbus on WorkstationST or PSCA IO Pack doesn’t have significant network loading, since it uses a specific Port (think mostly 502 or 503) instead of broadcast. I do agree Quantum matters.

As WTF? mentioned Wireshark is “The best” for network monitoring & analysis.

@WTF: In their user conferences on a side conversion with one of the booths display member showed OEM system test setup and claimed that they were testing with real IO Packs and simulated plant level field inputs from some high fidelity stuffs were used before every release of the product. It was a huge setup several panel lineups were there, almost every IO pack was used. They did not allow me to take picture of it. I was curious to know about their new IO Packs, but the images were blurred intentionally.

The OEM’s partner team or their life care team was pitching in for some new switches & cyber security system. You may check with the OEM for the upgrade package if you face network issues.
 
@WTF? , @Swami

Thank you both for the technical insights.

Regarding the Mark VIe runtime version you mentioned — I’m not entirely sure which specific version this refers to.
However, in ToolboxST, our system shows the following:

Baseload: V04.00.04C

Firmware: V06.06.05C Build 132


Are these the versions you are referring to when asking about the Mark VIe runtime version?
 
The Firmware Version is the one I was looking for.
This was released as part of ToolboxST Version V07.04.09C some time in late 2019.

If that is one you have, then, you don't have the required ECC enhancement service pack.
Would suggest to get in touch with the OEM & ask for upgrade of ToolboxST V07.09.xxC or V09.10.xxC.
 
Top