GE MarkVIe (Migration) - IONet Timeout \ Watchdog Protection Trip

Good Evening,

I am curious if anyone has seen anything similar to the attached trip log? We have a simplex migration system (Mark V boards) that was recently installed and had the unit trip with the diagnostic log below. We lost all indication on the unit for ~ 90s. We also checked the heartbeat on the controller and it seems to have restarted. I am not sure if that count increases resets after a reboot, but when we checked the counts it matched with restarting in this same data period. Since this event we have not seen the issue repeat itself, did the controller reboot itself? If so, how did it capture diagnostics for the lost IONet connections?
Thanks!
jk
 

Attachments

As an update, we found from advanced diagnostics that indicated "FPGA CRC - reset detected." Anyone dealt with that before? We are working with support in tandem.
 
Thanks for the update! You probably already know, FPGA stands for Field-Programmable Gate Array, and I think they are used in several I/O Packs. Do you know which one(s)?
 
This was from the advanced diagnostics on “R”, I'm not sure if that FPGA CRC error would propagate up from the IO packs or not?

We would have UCSC, PIOA, PMVD, PMVE(R), PMVE(C) & PMVP.
 
I successfully avoided the Mark V-to-Mark VIe migration. Don’t know about that at all. I’ve never even had a glance at the manuals for it; it’s such a kludge. And a lot of people bought it thinking it was something entirely different, and were displeased when they saw what they got. (Same wiring; still using ribbon cabl; same card carriers; etc.)

Anyway, good to hear you’re in touch with the supplier. Am interested in the resolution (since I do work on UCSx processors).
 
Thanks for the reminder, I know how important it is to follow up on questions like these for future reference.

Preliminary thoughts from the support group point to an FPGA phenomenon called a single event upset (SEU) which may cause a controller or IO pack to reboot. Unfortunately there isn’t much else to diagnose, but will continue to monitor.

We will look into implementing a redundant controller (R,S) configuration to mitigate this issue.

(We should have already as most other site DCS controls have redundancy.)
 
lo-47,

Thanks for the information!

SIMPLEX does not have any redundancy. Is the turbine a gasser or a steamer?

SEU, huh? I’m going to add that TLA (Three-Letter Acronym) to my repertoire. Might prove handy one of these days. (As is said, “If you can’t dazzle them with brilliance, baffle them with bullshite!” Tech support is really good at it, too!)
 
This is a small 20MW steam unit that had simplex Mark V. I’ll look to transition to redundant controllers (R,S & X,Y,Z). As long as the IOPacks we have can support the communication to R and S.

That was our thought as well on the SEU…but I guess we just don’t have additional diagnostics to think otherwise.
 
Hi, news on this issue?
We've got a trip a week ago in the Steam turbine, the MARK-VIe unique thermocouple input card, PTCC got this "FPGA CRC error reset detected", this card have the three inlet steam temperature sensors, it was offline for about 60s (during reset), controllers lost comunication with it and the temperatures were zero for this period of time.
Going to the logs we have:
----------------
PTCC Card ERROR LOG:
....
01/31/2025 12:58:57:547 hw ResetLatch.cpp Line:261
FPGA CRC error reset detected.
01/31/2025 12:58:58:817 syncd ptpport.cc Line:271
Port 1 not configured
..
----------------
PTCC Card EVENT LOG:
01/31/2025 12:58:55:952 diag DiagSysStartup.cpp Line:495
Process Started. (Build Apr 18 2013 14:26:47)
01/31/2025 12:58:56:174 adlrout AdlSysStartup.cpp Line:524
Process Started. (Build Apr 18 2013 14:26:38)
01/31/2025 12:58:57:424 hw HwMain.cpp Line:412
Process Started. (Build Apr 18 2013 14:26:50)
... keeps starting processes..
01/31/2025 12:59:25:064 ControlState.cpp Line:471
Entering Control State EXCHANGING.
01/31/2025 12:59:25:064 ControlState.cpp Line:471
Entering Control State SEQUENCING.
01/31/2025 12:59:25:064 ControlState.cpp Line:471
Entering Control State STANDBY.
01/31/2025 12:59:25:104 ControlState.cpp Line:471
Entering Control State CONTROLLING.
----------------
Designated Controller:
..
01/24/2025 16:41:54:944 diag - Diagnostic Alarm
Alarm 469: UDH EGD fault detected by the R processor
01/24/2025 16:41:55:944 diag - Diagnostic Alarm
Alarm 470: UDH EGD fault detected by the S processor
01/31/2025 12:58:21:304 diag - Diagnostic Alarm
Alarm 1548: Inputs unhealthy on IO Module 36 IONet 1 - Message Timeout
01/31/2025 12:58:21:304 diag - Diagnostic Alarm
Alarm 1804: Inputs unhealthy on IO Module 36 IONet 2 - Message Timeout
...

Hard to buy this SEU explanation, but have seen this in fpga watchdog reset implementations, what do you think?
Sorry for my english, hope this helps.
Thanks!
 
Hey, just to give you an update the controller was replaced with a new controller from GE. I am not sure if the revision changed, but support wanted our controller so their team could look into it further.
We have not seen the event since the replacement (probably a year or two now since it was replaced). I am not sure if what you’re seeing is the same thing, but the missing data and that does point to a reboot.

In your case do you have controller redundancy?
 
Top