Mark VI IONet N-Tron Network switch failures

D

Thread Starter

D.V

These Ethernet IOnet components (switch) fail and a diagnose alarm in DCS is received - "Input unhealthy on IO module IO net1 message time out & Output unhealthy on IO module IO net1 message time out"

Possible reasons - Internal temperature enclosure 53 deg C, lack of maintenance, power quality issue, redundancy questionable, RCA needed
 
D.V,

I'm not aware of any N-Tron switches used with Mark VI equipment; Mark VIe, but not Mark VI.

<b>WHEN DID THIS PROBLEM START?</b>

I'm not sure if this is the proper forum for this question. Shouldn't you be working with the supplier of the equipment (I presume that's GE, or one of its packagers).

Have you contacted GE or the supplier/packager for assistance? What has been their response?

53 deg C compartment temperature sounds <b>awfully</b> high. That just doesn't seem very sustainable for most electronic equipment. Something seems amiss with cooling or design or location. Are these cabinets located in direct sunlight in desert locations? If so, can it be that the designers of the cabinets were not aware of how the equipment would be positioned, or that the installers did not take into account the requirements of electronic equipment?

You haven't provided all the information about when the problem started, what's been done to try to mitigate the problem, details of the installation and maintenance of the equipment in question. In short, there's a lot that's not known about this application.

GE did issue a Product Service Bulletin some time back (a couple of years ago, I believe) about some quality problems with the RJ-45 connectors used by N-Tron in the construction of the switches.

There is also a Product Service Bulletin out about the minimum bend radius of Ethernet cables being plugged into the RJ-45 connectors of N-Tron switches; if the bend radius is too tight the electrical connectors can be subject to misalignment. I could certainly foresee a condition where a tight bend radius along with what I would consider to be extremely high ambient temperatures could lead to an intermittent problem, or even failure.

But, if you're asking someone on this forum to do an RCA, I think you're asking in the wrong place. The people to perform the RCA are the people who designed the equipment, in conjunction with the people who installed the equipment, and the people who maintain the equipment. All the aspects of the application, environment, and maintenance need to be considered when performing an RCA.

I would suggest you write up your concerns, along with the number failures you have experienced, and include the failed switches in the package, and then contact the supplier (GE or one of their packagers) and ask for help in resolving this problem.
 
Pardon,

It's a Mark VIe control for control and protection of steam & gas turbines and power generation balance of a plant equipment. N-tron network switches are used in a TMR Mark VIe system and they fail randomly (2 on ST, 1 on GT 2 on BOP). They are mounted on top of the panel enclosure next to the power modules (48V DC, 28 V DC and installed inside the plant not exposed on sunlight or dust/sand) according to the operating environment ambient temperature is acceptable -30 C to 65 C and reliability rate if we see it shows 2.5 to 3 failures per year on temperature raise (till now 5). According to the Diagnose Alarm Troubleshooting manual the possible cause is I/O pack communication malfunction or IONet malfunction, Solution -Check I/ONet pack health, diagnostics IOnet (cable, switches) I am aware of the high temperature (50-53 C) but I am concerned if I am missing something... (I have that feeling) GE as usual acts as businessman so an RCA from their side is very general and every attempt to raise a reliability issue they immediately dismiss...
 
How old is this installation?

Do you know if you have reviewed and replaced any N-Tron switches per the Product Service Bulletins? (I believe serial numbers from the Product Service Bulletins can be used to identify faulty switches.)

How are you measuring the temperature inside the control panel enclosure where the N-Tron switches are located? Do you have an independent temperature sensor (T/C or RTD) suspended in a location near the area where the switches are installed to monitor the temperature?

Where did you obtain the ambient temperature data you provided? Can you tell us where in the Mark VIe System Guide you found this information?

Is there any ventilation in the control panel enclosures?

Forced (fans)?

Convection (natural circulation via louvers on the doors and/or in the roof of the enclosure)?

Are there filters over any of the louvers? If there are filters over any of the louvers, what are the conditions of the filters?

What troubleshooting have you done to try to resolve the problem? What were the results of any troubleshooting you did to try to resolve the problem?

In your opinion, would a ventilation fan or -fans be capable of providing sufficient air circulation for cooling?

Lastly, your attitude about large companies is pretty prevalent in some parts of the world, as if large companies just want to take your money and leave you with a pile of rubbish. GE has been in business for more than a century, and couldn't very well remain in business treating their Customers like that.

If you search control.com, you will find several threads where a GE employee has provided a website where registered users can obtain information and even help via email or telephone. Have you availed yourself of this offer?

Here's the link to one such thread:

http://www.control.com/thread/1307947213#1313704775 (remove any spaces inserted by the webpage software when you paste the URL into your preferred Internet search engine).

Be prepared to provide some information about the site where you work during registration, and it will take approximately 24 hours or so (I'm told) for your registration to be approved, after which time you can search and post to your heart's content.

I will warn you in advance (having been a GE employee in the past), they will ask for lots of information and data. Some of it may seem irrelevant. If it does, step back and make sure you have provided a good description of the problem (certainly better than you have provided here), as well as actionable data (temperatures, photographs, etc.). Also, remember that while you may think the requested information is irrelevant it may not be. GE is not always the best at communicating why they want some information. You are free to ask, nicely, why this or that information may be necessary but if you don't provide the information they ask of you then you probably won't get the assistance you are looking for. You have to part of the solution, and you have to provide a good definition of the problem. If you want to achieve a satisfactory outcome.

Again, GE could not have remained in business all these decades (more than 10!) by ignoring Customer's problems. If your site is experiencing what you believe to be excessive problems with their GE equipment, use all the resources you have to get them to understand the problem and help you resolve it--working with them. You will find they aren't such a bad company after all.

Please write back to let us know what you think of the site and the assistance you obtained through the link above!

I would still like to see your answers to the above questions. And, I think GE will ask similar questions, anyway. (Hint. Hint.)
 
We are facing same problem in our plant GE mark vi-e 9FA machine ever year 8/9 no’s.

Over all my plant we have 240 N-TRON switches. As per recommendation we replaced all new series switches still the failure rate is not good. We observe some time the designated controller R reboots himself. Heart beat stopped, controller out of state alarms is received. Our net work P core PPRO I/O pack communication is connected R, S, T N-tron Switches.

Are there any issues or this communication connected to directly to controller?

Panel temperature is Marinating 36 Dec C and mounted top of the Mark vi-e panels required relocation?

Whenever we replace N-TRON switches the problem solved all alarm become normal. Is the problem with N-TRON Switch? Or controller? P core PPRO?

All the time its happened (most failures) with R Designated controller N-TRON switch. Please tell us what solution you found? i am requesting to you all anybody experienced similar problem have any solution?

Thank in advance
 
B

bicycle mechanic

Relocate switches in cooler areas.

Bottom of control cabinet. you will have to unbundle all that wiring.

When those doors are closed...AC from the PEEC does get circulated inside,or just not enough.

Those MKVI power supply put out a lot of heat.

I have noticed this on a lot on LM packages.

It always seems, the cooling for outside cabinets on the side of the package does not work or not adequate.
 
Thank you very much Mr bicycle mechanic we are looking for opportunity to modify location.

Anybody have experience about while failure N-TRON switch Heart beat Become 0, any observe PPRO pack which is connected to the net work switch, PPRO rely gets chattering?

Please tell your experience about above issue on N-tron switch Failure.

Thanks in advance
 
What causes a relay to chatter?

Weak holding current, or some intermittent reason.

If the switch is failing, chances are the built in power supply section is being affected.
 
VN,

Would it be easier, at least for a trial period, to add a fan, or fans, to the turbine control panel to increase the flow of air from the vents at the bottom of the panel through the vents at the top of the panel?

As was mentioned previously, if there are filters over the vents at the bottom of the door they can block quite a bit of air flow if they are allowed to get dirty and remain dirty or not be cleaned periodically.

I have been to MANY sites that have added inexpensive fans (sometimes called "muffin fans" because of their small size) which require minimal current from an AC mains (110/220 VAC) to operate. Some were installed as a test (temporarily) to see if they helped with cooling the control enclosure and components and were left there for many years without permanent wiring/conduit and did quite a good job of reducing internal component and enclosure temperatures. Many were made permanent because they were so helpful in reducing temperatures.

Again, this could be a temporary test, and the test might include moving the fan(s) to different locations to see if one location was better than another at helping reduce internal and component temperatures.

It doesn't take a lot of flow to increase circulation. Increasing the air flow does usually increase the amount of dirt that flows into the control panel enclosure, and as we all know dirt and humidity are the second and third most common passive causes of electronic component failures--after heat. So, it may also be necessary to install filters, and then to regularly inspect and clean/replace them, as part of the test or routine if it's found the fan(s) help with the cooling issue.

The I/O Packs, especially those associated with "core" discrete I/O terminal boards (combined discrete inputs & -outputs), and the "core" analog I/O terminal boards (TCAT/PCAA & TCAS) seem to generate quite a lot of heat. Those I/O Packs and terminal boards should really not be located in a vertical fashion with more than three or four I/O Packs located in a vertical plane, especially the TCAT/PCAA assemblies.

But, it would be very difficult and time-consuming to relocate terminal boards and field wiring, so the most expedient thing to do would seem to be to try increasing the air circulation through the control panel enclosure--without adversely increasing the dust/dirt flow through the enclosure.
 
We too are facing a lot of failures in N-tron switches (516/508 Fx and tx). After an RCA by N-tron, the switches were changed to Revision D from Rev C. But, the failure is still continuing. As per GE, the failure of Rev D switches is happening only in our site- nice to know that you too have such an issue and we can work together to resolve this.

We have switches installed in the topside of control cabinets in PEECC, and temperature is controlled at 21-23 Deg C throughout the year. Cabinets have louvers at the top and bottom, without filters or fans.

With Rev C switches, one of the zeners in power supply circuit was under rated as per RCA and they came up with Rev D, which is creating more problems for us. Rev C used to fail once and for all but Rev D, fails and comes back on its own. This toggling, switches designated controller from one to the other if the switch is directly connected to the controllers. Another thing is the number of nuisance alarms it generates. Identifying the "toggling" switch is another big task- we've to put up a dedicated person in front of the panel, for three to four hours, to identify the culprit.

We are yet to finalize on an action plan, to resolve this issue.
 
Sagir,
Is it possible that the Rev. C switch failures have caused some problem(s) with the 28 VDC power supplies or with the 28 VDC power distribution to the N-tron switches resulting in intermittent problems with the Rev. D switches?

Do you have the ability to monitor the 28 VDC power supplies to the switches using a storage oscilloscope or some other recording device to check for spikes or dips in voltage that can be linked to the intermittent problems with the Rev. D switches?

What is the manufacturer of the 28 VDC power supplies at your site?

Do you have sufficient spares to be able to replace all the 28 VDC power supplies, or at least a majority of them, to see if that lessens or eliminates the problems with the Rev. D switches?

When the Rev. D switch problems occur, have you been recording which IONET the problems occur on? Is it primarily with <R>, or <S> or <T> or just random?

Are you sure about the "cleanliness" of the power supply to the 28 VDC power supplies?

Does the site experience lightning strikes?

Are there a lot of battery grounds on the unit(s)?

What is the application: generator drive or mechanical drive? What is the turbine (gas; steam; Frame size; etc.)?

It seems that N-Tron has been working with you and/or GE to try to understand the problem(s), so that is good.

Please write back to let us know how you fare in the resolution of your problem(s).
 
Dear Sir,

1. Power supplies are common for the IO packs as well as the network switches. No failures are reported so far for the power supplies.

2. We do have ability to monitor 28V supply. No dips/spikes observed but yet to check with a scope. Monitoring done through DCS only, with milli sec resolution.
3. Power supply make is Traco

4. Replacing all the switches- not feasible at present, as we'll be having around 300 of them.

5. Failures are random in nature- not related to any one controller.

6. We'll confirm cleanliness after testing with a scope.

7. No lightning disturbances observed at site.

8. Battery grounds happen but are rectified as and when it comes.

9. It is an IWPP, with gas, steam turbines as well as desalination. Total plant controls is on Mark VIE platform.

One more issue came to mind is the supply rating for the switches. It is rated from 10 to 30 V DC and we are operating at 28VDC. This can be an issue, we feel. And, in the 28VDC bus, we don't have any spike arrestors, but it is going only to electronic cards/switches.

We're still struggling with this issue and any suggestions are welcome.
 
sagir,

1. My feelings were that the Rev. C switches might have caused some problems with the Traco power supplies, which are generally very reliable--but in this circumstance might be suspect. Have you examined the loading of the power supplies? How much is the total current requirement of all the devices powered by each power supply? I would recommend installing a physical ammeter (possibly clamp-on?) to determine the actual, physical power draw for each power supply.

Also, I have seen some very interesting patterns show up and become quite obvious when failures are "plotted" and tracked. It doesn't take much to do this, and it can and quite frequently does yield some very surprising results.

2. The DCS is Mark VIe, right? So, you're using Trend Recorder?

A digital storage oscilloscope would still be helpful in troubleshooting--and proving or disproving a hypothesis.


> 4. Replacing all the switches- not feasible at present, as we'll be having around 300 of them.

I did not suggest replacing the <b>switches</b>; rather I suggested replacing the <b>power supplies</b>, possibly in a cabinet by cabinet fashion and tracking failures by cabinet to see if it has any effect.

It is extremely common for devices these days to have a range of power input voltage. The Traco power supplies probably have voltage/current limiters on their outputs, but that's just an educated guess as they are not inexpensive. (Think about most 4-20 mA transmitters; they are commonly rated for operation from 10-40 VDC, though the most common power supplies are 24 VDC--except Mark V (~21 VDC).)

GE is now using Phoenix QUINT 24 VDC power supplies that are adjusted ("cranked up") to 28 VDC, and they can be had in a wide range of output currents. And they are usually physically smaller than comparable Tracos--and possibly less expensive....

What is the power source for the Traco power supplies? Does it vary from gas turbine to steam turbine to DCS, some having 125 VDC, others 120/220 VAC? Do any of them get powered by inverter/UPS?

Have you tried contacting Traco? They are a very reputable vendor, and would probably be a good ally in this troubleshooting endeavour.

And, allies are always good in this kind of endeavour.

Have you made a Fishbone diagram (Sick Sigma methodology/terminology) of all the possible problems? There are a LOT of site around the world using Mark VIe very successfully, so there must be something unique about this application. I imagine the "Projects" department of GE was contracted to design/build the plant, and they usually can be counted on to deflect responsibility for any problems to the various GE product departments (for example, GE-Salem--whatever they're called these days)--but they usually can bring some very great pressure to bear on the product departments to get problems solved.
 
Nice to read to know about MKVIe. Though I am having MKV experience and now feeling confident to handle MKVIe too. Thanks for the information.

take care
g.rajesh
 
Dear all,

We have the same problem within three of our units. GE MS3002 gas turbines driving two stage Sulzer compressors as natural gas boosters. the machines used to work quite well on Mark II until we decided to upgrade for Mark VIe. Since the beginning we noticed various diagnostic alarms of communication failure (I30comm_alarm), the n-tron switches are seeming to be turned off then on very quickly. we can notice that by the green LEDs of the RJ45 ports which blinks from time to time causing communication interruption and we confirmed that with toolbox trender which we used to monitor the communication signals from each switch. the frequency of these interruptions used to be under control but it became worst with time. we replaced the faulty switches several times and we could manage to avoid a lot of machine trips by identifying the faulty switches then aligning them within the same controller (S controller) in order to avoid the situation where 2oo3 controllers are being out of control in the same time. but the situation is getting worst now in summer. the strange thing about those Ntron faulty switches is that they return to normal if we use them in other machines we have in our site (besides the three i mentioned, we have several other machines working with Mk VIe very fine). for us this confirms that the heat is being the main cause of their failure, since we have checked also the power supply (125 VDC and 220 VAC) and it seems good with no ground faults. the temperature inside the cabinet is quite high. the units are located in a desert area but the control rooms are well air-conditioned. there are louvers in the top and bottom of the cabinets and equipped with fans in the bottom in order to make fresh air circulating. some of our colleagues are thinking maybe the A/C is just not enough. but personally i have some doubts also about the RJ-45 cables since some of them are bent in a very harsh angle (90°). so please could anyone tell me if a deteriorated RJ-45 cable could lead to causing problem for the whole switch and make it to malfunction in the way i described above.

One final thing i want to mention is that the Mark VIe in the two of the three units i mentioned above is used also as a BMS for gas heaters and a PCS system for gas dryers with a lot of extra I/Os coming not from Gas turbine. of course we realized very late that it was a bad decision not to dedicate the proper controller (plc, BMS) for those two process. I would like to know if there is a max number of I/O not to exceed when to use MK VIe for control. and whether a lot of I/Opacks could simply make the system to overheat.

Thanks for any help or comments, I really appreciate any feedback from your part.
 
Badr,

Severely bent Ethernet cables will definitely adversely affect communications; GE put out a TIL (Technical Information Letter) to that effect. It doesn't cause heat, but puts a mechanical strain on the RJ-45 connectors--both on the cable and in the switch receptacle. (I've often marveled at the flimsiness of Ethernet RJ-45 connectors and how they are used in critical applications. They're inexpensive but also not very forgiving of mishandling or high tension stresses.)

There is really no limit to the number of I/O--except that there is a limit to how many I/O Packs can be in one panel simply based on the heat they generate and the ability to get that heat out of the panel.

If the I/O Packs are arranged in vertical columns the heat generated by the lower packs will rise increasing the temperature of the packs above. The packs generate heat, but it's the vertical spacing which is bad for the packs--and can also affect IONET communications of the packs. If you have a heat problem to begin with, AND if the I/O Packs are arranged in tall vertical columns with limited spacing that's going to make the upper packs get really hot and experience more problems than they might otherwise. And if there are a lot of packs in one panel with poor circulation AND in tall vertical columns that's also not good.

My experience with control panel fans is that they work better at the top of the cabinets, pulling hot air out of the top and cool air in at the bottom. Filters, if they don't allow sufficient air flow, can make the problem worse regardless of where the fans are located.

The fact that the switches work in other locations seems to point to heat as being the problem. And, in fact, it may not be the switches at all--it may be the I/O Packs getting too hot. In fact, that's a more likely issue given that you're seemingly replacing switches with no real improvement. I haven't seen switches adversely affected by heat but have seen a lot of I/O Packs have intermittent problems. (Amazingly, when the control panel doors are opened the problems seem to dissipate as the heat in the panel dissipates, and then some time after the doors are closed again the problems start up again. Funny how that works...)

A lot of desert locations seem to be having heat-related problems with Mark VIe systems. Some were even put in locations without air conditioning....

Once you get the kinks worked out, you'll realize how much better the Mark VIe is than the Mark II ever was (actually, you probably already do--it's just the unfamiliarity with the Mark VIe and these intermittent issues which are probably more related to heat and I/O Packs than heat and N-TRON switches). And, you may find that you need some box fans or something on the floor to help circulate cooler air into the lower panel louvers and that by moving the existing door-mounted fans to the upper louvers to pull hot air out of the top of the panel (which will pull cool air in through the bottom) that the IONET communication issues will go away. Just remember that house-keeping is very important, especially if you try using fans on the floor of the compartment to help with air circulation!

Please write back to keep us updated on your progress!!!
 
CSA,

Yes the IO packs are placed in a tall vertical columns. the switches are being placed in the very top. they seem to collect all the heat which build up in the top. we did open one of the n-tron switches and clearly the electronic circuit was grilled by overheat.

we don't have enough space to relocate them to the bottom. we are thinking about making an opening in the top of the cabinet to evacuate heat.

Maybe a better solution is installing a vortex tube cabinet coolers. we do use them in some outdoor-located PLC cabinets. http://www.airtx.com/cabinet-coolers/

Is there a way to troubleshoot whether communications interruption issues are being caused by the I/Opacks not the switches? which kind of signals/alarms shall we look for?

Regards
 
Badr,

I'm a little confused because you wrote earlier:

"...the strange thing about those Ntron faulty switches is that they return to normal if we use them in other machines we have in our site (besides the three i mentioned, we have several other machines working with Mk VIe very fine)...."

Now you say you've opened them and they are "grilled."

Anyway, any cooler would probably help, and I would suggest trying to relocate the existing fans from the bottom of the panel to the top of the panel, and orienting them such that the draw air from inside the panel and exhaust to outside the panel. Leave the louvers at the bottom of the panel as unobstructed as possible. Cool/Cold air is dense, and generally falls to the bottom of the compartment, and by relocating the fans to the top of the panel and leaving the lower louvers unobstructed I believe you will find the fans will draw the hot air out of the top of the panel which will create a natural draft which will draw the cool/cold air in through the bottom of the panel.

This would seem a simple and "inexpensive" attempt without going through the expense and effort of purchasing and installing the vortex coolers--which will require additional equipment (air compressor(s)) and piping. And to work properly I believe the panel venting has to be modified, also. They are excellent coolers, and fairly simple devices but, again, they need auxiliary equipment and modifications to work properly.

"Cooling" in a control panel isn't always just about heat removal--it's also about humidity control. If the panel gets too cold and the ambient environment outside the compartment is humid, then I've seen panels severely damaged if the compartment door gets left open or opens because it wasn't shut properly and the moisture in the outside air condenses on the electronics. These vortex coolers can significantly reduce the temperature in panels/enclosures--but too much cooling is also not good if the outside ambient is humid.

The packagers of the Mark VIe aren't really trained to observe the heat rejection requirements of the Mark VIe components, and simply try to put too much equipment in panels--and end up doing so by making tall, vertical columns of terminal boards with I/O Packs. Put your hand over any PDIO I/O Pack with the unit is running and you sill feel a pretty significant amount of heat from a single pack. Put a vertical "string" of these packs together and you've got a pretty good "heater" going. Also, if the control panel uses the combined analog I/O terminal boards (TCAT) with the very large I/O Packs--they can generate a significant amount of heat, also.

I have heard of people relocating the N-TRON switches to the bottom of the panel, but most times there isn't the space available to do so, and the field wiring doesn't make that really practical.

I don't know of a way to monitor IONET communications to look for faults/collisions with Trend Recorder or any other software "tool." I'm sure GE can tell you how to do that; are you working with GE to try to resolve this issue?

As for putting louvers/openings in the top of the control panel, cutting holes in the cabinets can be problematic--any small metals shavings/shards which aren't captured and held in place can easily make their way into I/O Packs and other electronics which usually have openings in the top for heat dissipation.... It would take some good planning and probably several magnets mounted to a thin board which FULLY encloses the top of the cabinet, but which can be removed without losing any metal particles not caught/held by the magnets. Not impossible, but it would take some thought and planning to prevent "collateral damage."

If you were to put louvers or openings in the top of the control panel, it would be a good idea to put a "hood" over the top of the openings, about 5-10cm (2-4 inches) above the louvers/openings to try to prevent dust from falling into the opening and down into the control panel. Dust can be very, very intrusive. And, if you put the fans at the top of the panel they would probably draw air in through the upper openings which would not be good for the I/O Packs at the bottom of the panel.

Hope this helps--and write back to let us know how you fare in resolving this problem!
 
Badr,

Have you measured the air temperature entering the bottom of the control panel versus the compartment temperature versus the air temperature exiting the control panel? Is it possible that there's some issue with the air circulation outside of the control panel that's preventing cool/cold air from entering the control panels?
 
Top