Unusual Tripping in Mark VI

P

Thread Starter

ProcessValue

well we faced a unusual trip in frame V machine. we are having Mark VI control system (TMR) and running a frame 5 machine in the refinery. I am giving a SOE of the trip so that experienced members can get a clear view .

1. VCMI diagnoistic alarm from Processor T appeared first " using default input data from T8 (vpro module Z) ". this is recorded all three R/S/T processors

2. VPRO diganotic alarm came for the following
" voting mismatch in the following signals "
a.Speed probe
b.emergency push button
c.cold junction temp
the alarm was going on getting set and reset in the 12 minutes leading to the trip.

3. after 12 minutes the following alarms appeared
" using default input data from R8 (vpro module X)" this is recorded only in R processor diagnostic and " using default input data from S8 (vpro module Y)" this is recorded only in S processor.

4. The machine tripped on speed sensors trouble trip ( happens when the difference in speed between the protection speed sensors used in protection core (77HT1,2,3) and the control
speed sensors (77NH 1,2,3) exceeds more than 5 % .

i analysed the trip and have come to some conclusions

1. the first alarm appeared because real time process value from the vpro z module failed. ( perhaps due to i/o net failure )

2. the second alarm is the next logical sequence , as the system is TMR the voting resulted resulted in the median value selection. as speed probes from x and Y were showing the near same value and from Z was zero , this alarm appeared.

3. this is where the problem starts , the communication from the X and Y also failed , leading to the tripping of the machine. why should this happen.

on analysing the trip log it was seen that the TNH_OS (protection speed signal) had gone zero only for 50ms and then regained its former correct value.

so even though , it is clear what caused the trip , it is not clear why this should happen.

a. why should the communication in all three protection cores fail ?

b. why should the second and third diagnostic alarm indicating communication fail from X and Y be recorded in R and S respectively , but not in all three which is the norm.

c. why should the communication get established itself in the next 50 ms. and now as the machine is running the problem is not occurring.

as all the communication failed, i suspected a common source to the problem and tightened the connections in the TPRO and changed the three io net cables from the vpro to the VCMI. But recently after 48 days after the trip, it happened again. Now i am at loss what to do. the trip is similar in nature. we have restarted the machine and it is running fine now , but the problem remains unclear. Any ideas why this should happen?
 
Dear ProcessValue, it indeed sounds like an interesting trip, and it seems you have done some good detective work, but don't you hate the intermittent faults that you can't recreate?

My guess in this situation would be some sort of intermittent failure in the I/O net as you obviously suspected leading to replacement of the I/O net cables.

Since the problem has since re-occurred my next guess would be some sort of very intermittent failure from a device on the I/O net. My first guess would be an intermittent failure from X core since you mention that it was in and out for 12 minutes before the first trip. My thought is possibly X core is somehow jambing communication between all the other cores intermittently? I am assuming that all that is on the I/O net are the MKVI cores and the protection cores, if there is anything else on the I/O net have a good look at these devices as well.

But verifying this will be difficult, to near impossible. I have seen at least half a dozen failures of VPRO cores in our machines, they seem to be the weak link of the MKVI, besides the power supplies anyway.

If I was in your place I would have to most likely resort to swapping out a component at this point, which I don't like doing since it is a poor way to repair a problem. But something this intermittent is almost impossible to diagnose in a methodical way. I would hope that your superiors understand the difficulty of the problem. I know that MKVI components are not cheap, but neither is lost generation and machine down time.

I wish you luck and look forward to your feedback on what you find and repair.
 
P

ProcessValue

well , what do you do if you don;t know the answer , you change the question ;).

I have given recommendation for two things

a. Replacement of VPRO Z core.

b. the speed sensor input trouble is a instantaneous trip , i have asked for a timer to be included (2sec) so that it will trip only if the fault persists for more than 2 sec. as in both the cases the values failed for 50 ms , this will circumvent the problem.

written to BHEL , they do not know how this should happen , they say they have referred to GE , still to get a reply.

what you mentioned is correct. i thought in similar way that Z core was somehow jamming the io net communication. But all other data transmissions between the VCMI were going on correctly. Why should only the VPRO modules get garbled is a mystery. If the whole io net was down then voting would not have taken place. and the io net is triple redundant and the VPRO cores are connected to each of the three io net buses.

I noticed that io net uses CSMA/CD protocol for communication. read the description in wiki and this is what i found out
"CSMA/CD is a modification of pure Carrier sense multiple access (CSMA). CSMA/CD is used to improve CSMA performance by terminating transmission as soon as a collision is detected, thus reducing the probability of a second collision on retry.A jam signal is sent which will cause all transmitters to back off by random intervals, reducing the probability of a collision when the first retry is attempted."

so my guess is something like this would have happened. GE has not provided any detail about how the data sequencing/collection is done in ionet. Though i am pretty sure that the answer lies in the working of the ionet insufficient knowledge in the area is a dampner.

Right now all i have to do is to hope that the next time if it occurs. it does not persist for more than 2 sec. lol.
 
PV,

So, you want to run without emergency overspeed signals for two seconds? I don't think the logic discriminates on which signal is out of spec, just that the primary- and emergency speed signals disagree. The primary signal, if it goes low, will cause fuel to be increased to try to increase speed.

Based on your "change the question" statement, sounds like a typical management response to a problem: "Get the I&C tech to put some "fix" in place so we can keep running! It's a programmable control system--program it to keep the unit running at any cost! We have bonuses to collect!"

I was going to suggest that the problem may lie with the speed pick-ups or speed pick-up wiring. Shield grounding; capacitance issues; pick-up problems; improper gap; etc. Some companies aren't really known for the best installations or wiring practices. If the wires aren't properly segregated from others, there can be induced voltage/noise issues, as well.

But, hey, go with the time delay! Never mind that hundreds of other turbines around the world don't need, nor do they have, time delays. When the unit disintegrates there will be time to find the root cause.

And remove the time delay before the re-start.
 
Dear PV,

it seems you may be doing things as you are directed, I've been there and done that.

But a two second delay for speed pickups is pretty scary! I haven't been around that long but I've seen machines do some pretty incredible things they were never designed to do in milliseconds, let alone seconds.

You have a difficult problem to diagnose, but trying to patch the problem never has seemed to work well from what I have seen in my short time here.
 
P

ProcessValue

Dear CSA , hurray for the tongue and cheek answer!! , but i have a few points to make

i am not introducing any time delay for the over speed protection. i am recommending a time delay for the HP overspeed fault input trouble.

The block used for overspeed protection is "L12HV1 gas_turbine". the block does discriminate between the loss of signal from the protective speed probes and the control speed probes.
the signal is "L12HFDP - HP overspeed fault-protective input trouble

" . what is does is it check the following condition
(TNH - TNH_OS) > TNKHDIF ( set at 5%)
there is another signal " L12HFC - HP overspeed fault-control input trouble " which does the reverse of it
(TNH_OS - TNH ) > TNKHDIF ( set at 5%)
both the logics do not take mod values so if the TNH_OS fails then only L12HFDP will come on line.

in our machine , it is configured to protective trip 3 and i am recommending a minimal time delay for only L12HFDP. the new signal will be say "L12HFDP_TD" . insted of L12HDFP i will be giving the new signal to the trip status logic.

i am not compromising on the overspeed logic and it is still there and will very much protect the machine. Yes this is a makeshift solution i am not proud of it but in my opinion it is a effective one to counter the problem.

the loss of control speed probes will trigger the "L12HFDC" and trip the machine , so the possibility you mentioned will not arise.

I checked the cables and retightned them , i checked for ineffective grounding and stray DC shifts in all the sensors in the TPRO board. I did not find any. If cabling is the problem then why should it affect only Z core of the processor. and only after 12 min lead to the collapse of the VPRO communication. In my view the solution will probably lie in the io net communication scheme. i am poring over it and will give my feedback i come across something.

CSA , i know i am a very junior member here , only going to be twenty three as such but after commissioning 4 gas turbines in my refinery i am expected to find a solution. I cannot go inside the room and shoot blanks , lol. I know what i am recommending a make shift solution but under the circumstances i cannot think of anything else. If you do have a solution i will be grateful to try it out :).

The machine was restarted and is now running perfectly. i am recommending the measures to avoid future spurious tripping.

thanks for your valuable comments and frank opinions:)
 
P

ProcessValue

I would like to add one more point.

the VPRO modules have separate speed sensors for all the three protection cores. software voting does not take place in the protection cores but a hardware voting takes place in the TREG boards with input from the VPRO. This unit does not have any time delay and if the protective speed inputs detect a overspeed it will trip the machine instantaneously. Thus in my opinion the protective EOS system is not compromised in any way.
 
P

ProcessValue

Dear MikeVI

i know it was an overkill to suggest 2 sec. i will be reducing it to 200-500 ms after discussion with the BHEL guys. what can i say thanks for understanding my position ;).
 
Process Value... It is highly unlikely that the speed sensors will operate simultaneously, thus, providing a natural delay of several milliseconds!

Phil Corso
 
Top