Help... System response time too slow (Large message!)

K

Thread Starter

Kelly, Rick

Friends...

I am looking for advice or any possible insights any of you may have on how to proceed next with the issues outlined below.

At this Kraft Canada facility we produce bulk cheese (640 lb blocks), liquid whey concentrate, whey powders, cheese powders and ready for retail wrapped pieces of cheese (sticks, blocks, single portions, powders and shreds). I have responsibility for all Y2K remediation efforts at this and two other Kraft Canada plants.

Here is my problem...

In one section of the facility we have a batching and drying operation. In this operation, various cheese products and other food grade ingredients are mixed together to form a slurry. This slurry, after being cooked and pasteurized in a horizontal laydown cooker (that uses culinary grade injected steam) is pumped to the top of a conical shaped rotary spray dryer. In the spray dryer, the water content of the slurry is evaporated by heated air. The output of the spray is "Cheese Slurry Powder" that falls to the
bottom of the dryer as cheese "snow". Further processing then finishes the powder and prepares it for packaging.

The batching & cooking area and the spray dryer section shared a common control structure. This structure was laid out as follows... A Siemens TI
545-1104 took care of process control in the batching & cooking area. A Siemens TI 555-1102 did the same job in the spray dryer area. The 545 had only local I/O connected to it, while the 555 had both local and 4 remote I/O racks. Both PLCs were connected (by NIMs) to a Ti-Way network. The Ti-Way network was then connected to a AST 80386 PC running SCO Unix as its OS. Running on the Unix platform was the TI SCADA package called "Ti-Star". The main "Ti-Star" terminal was the HMI for the spray dryer area while a secondary "Ti-Star" terminal (on a 386 DOS box that talked RS-422 back to the Unix server) performed the HMI function for the batching & cooking area.

Along comes Y2K and we discover that the old Ti-Star application sitting on the SCO Unix box is not Y2K compliant and must be replaced. So we built a new system...

Our corporate standard called for a solution that consisted of USData's Factory Link ECS (the Y2K compliant 6.1 for Unix version) sitting on top of HP-UX (10.02... also a compliant version) running on HP9000 class machines. Ethernet was identified as the media of choice. The older 545-1104 in the batching & cooking area was replaced with a new 555-1106 due to a need for more memory.

A Cisco Catalyst 1912 switch was to act as the "hub" for all of the Ethernet connections. This switch provides us with a dedicated 100 Mb channel for the HP server and a full 10 Mb of bandwith on each of the other ports. In order to connect the PLCs to the Ethernet media, two TI 505-2572 (the CTI card) Ethernet/Serial cards were purchased for installation in the local racks.

It was decided to use NCDs Explora 700 XWindows thin clients along with "BIG" (wish I had them at home... LOL) monitors as the Operator stations.

So connected to the switch we have... The 100 Mb channel for the Server, a 10 Mb channel to the 555 on the dryer, a 10 Mb channel to the other 555, a 10 Mb channel to the dryer Op station (HUGE
XTerm) and another 10 Mb channel to the other Op station located in the batching & cooking area. Cisco diagnostics have reported a demand loading on the switch of only 3%.

The programs in both PLCs were created using TIs APT (a pre IEC 1131-3 standard language that includes SFCs, CFCs in a graphical construct, an excellent tool by the way). No changes to the base programs other than variable reading and writing in multiple blocks were required by the remediation.

The HP9000 we purchased has more resources then Carter has Liver Pills... It has a 160 Mhz RISC CPU, 256 Mb of ECC RAM, 1 Mb of L2 Cache, Twin 100BaseT ports, Twin Hot Swappable Mirrored 9 Gb SCSI drives, a 12 speed SCSI ROM drive, a 4 Gb
DAT drive and a monster of a UPS. In essence this platform is... every grown up fan of the "Commodore Vic 20" dream.

So what is the problem... The response time of the system is ridiculously slow. From the station in the batching & cooking area we see response times of 4 to 5 seconds while on the dryer portion of the system it is not un-common to have the HMI update 10 to 12 seconds after an event. With the resources that exist in the
HP9000 and the closed to other traffic Ethernet segment this control system should scream. To date we have done the following...

1) Ensured that ... the switch is in "Full Duplex" mode

2) Checked loading on the switch (its at 3%)

3) Ensured that HP-UX is configured as per our standard for optimized performance (we have about 50-60 similar systems in operation at other plants).

4) Removed some HP-UX daemons that are not truly required to be in memory.

5) Verified that communication of tag data across the system is as homogenous as possible. The EDI driver that is used (Dastec) takes block reads and writes into consideration and performs
optimization on its own as well.

6) Sent a MPS (Multi Platform Save file of the FL ECS source code to USData for analysis to ensure there are no issues with the application as built by the integrator. Who by the way have been
excellent at helping to address this issue.

Identified Possible Solutions...

1) Migrate the FL ECS code to Compiled Math and Logic from the current Interpreted Math and Logic. Means switching from PowerVB source code to compiled C++ object code that is then linked in to create the executable.

2) Migrate the FL ECS code to an exception read/write basis where if a variable changes then the impacted variable is read/written instead of the current situation where we read/write the entire tag dictionary. Means going from two routines to literally hundreds of sub routines.

At this point my prime theory is, is that communication traffic across the back plane of the 505 rack between the CPU module and the 2572 Ethernet card is restricting to throughput. I would like to move from a Ethernet EDI interface into the HP server to a Profibus one. As the Profibus port is native to the CPU module, this removes any chance of the backplane impacting
communications throughput. As this is an expensive fix to try... I am looking for comments from this august body.

I know this message was long winded... but I wanted to give you all as much data as possible. Thank you for your time and I look forward to any comments from the group. There are 2 simple
.bmp diagrams that illustrate this -- if you'd like to see them, contact me directly.

Best Regards... Rick Kelly

Project Technician
Cheese Operations
Kraft Canada
(613) 537-8069 V
(613) 537-8057 F
[email protected]
 
T

Tatjana & Zeljko

(Originally posted 10/18/99)
Rick,

I have used similar hardware on the PLC side (TI 505-1104 with CTI 505-2572 and NIM modules), probably with the I/O count similar to one installed at your facility. The responses we have is usually between 1 and 2 seconds after field event has taken place. The databases are between 1000 and 2000 analog and digital tags (both physical and PLC internal I/O points), with tags divided into several groups. More tags would definitely require longer update times, but definitely not that long. We used ladder logic for device control, as well as excellent TI's feature - Special Functions (SF) for process related control and calculations.

The SCADA packages were Windoze based, and compared to TIWAY communication, Ethernet was a huge step ahead.

There are several steps that can help determine where the problem is.
1) PLC h/w: with RLL scan times can go up to 100ms, thus changing
timing for communication services (you should check how often the
PLCs allow for them); TIWAY speed between racks is, as far as I can
remember 9600 baud; CTI 505-2572 has its limitations too (you can
check that in the 2572s Programming Reference Manual - how many words can be sent/received with one read/write request, as well as how many requests you can send in a second)
2) network h/w including PC's NIC, without SCADA package (you can start with ping command - network's response time)
3) take a peak into your SCADA package (more serious programs have system logs!).

Before doing something similar to what I listed above (without my intention of being a smart a..), you should consider talking to:
1) System Integrator (especially if the system is still under warranty) - they should be able to resolve this problem, otherwise ...
2) Your Siemens supplier (try calling Ontor in Toronto, Mario Tome is excellent person for resolving PLC and SCADA issues) or Siemens TI technical support
3) (shameless I admit) call us, AutoComp Systems at (905) 415-9363, Toronto

Zeljko Fucek
Applications Engineer
 
D

Darold Woodward PE

(Originally posted 10/18/99)
It is apparent here that all of the building blocks appear to be adequate while the overall performance is unacceptable. My suggestion is do some testing to determine where the time is spent. I have constructed similar tests with much simpler systems here.

The basic concept is to send a control and monitor for when the new proces condition feedback arrives at the command station. Then, devise some observations along the path so that you can construct a simple timing diagram to see where and when the message gets to different locations.

The idea is not to get too worried about seeing every thing at once. You should take some overall measurements and then take other measurements to decide where the bottelneck is. For example, if you can use a network analyzer and maybe an o'scope you can measure how long it takes the command to get out of the master and hit the network. Perhaps you can make small
modifications in the PLC to find out how long after the message hits the wire it makes it into the PLC etc.

If you pick the tests carefully, each one will tell you to look either upstream or downstream for the slow point. Also, my tests have revealed
something that you seem to be seeing. Network speed is a "red herring" as far as overall performance is concerned. Even at much slower serial port speeds (provided your network is efficient and uses data sent only when it changes rather than poll/response of all registers) your limiting factor is usually processing of the message on either end.

I think a little data gathering may be of help before you try "expensive" solutions and might allow you to justify the costs if you need the costly solution.

Darold Woodward PE
SEL Inc.
[email protected]
 
J
(Originally posted 10/18/99)
We had a similar problem using the TIs on Ethernet. As it turns out, the TI PLC becomes bus-bound in its communication to the 10baseT card. The Ethernet side will communicate at 10MB but the PLC side is several orders of magnitude slower. The only real solution is to upgrade PLCs - and if your going to do that you may as well go for one of these state-of-the-art little
boxes that have been recently released.
 
(Originally posted 10/15/99)
Rick,

For the sake of message size I have only included the outline of your new system & the possible suggestions you make. It's still a pretty long message.

Firstly, what sort of response times do you actually require, ie, what are you aiming for? What did you get before? And, are the response times you quote times from plant action (eg an on/off event on a digital input) to entering the FL tag database via the driver read tables, or to actually updating the screens?

>1) Migrate the FL ECS code to Compiled Math and Logic from the >current Interpreted Math and Logic. Means switching from >PowerVB source code to compiled C++ object code that is then
>linked in to create the executable.

This is a little confusing to me. Interpreted M&L and PowerVB are two different animals. Converting IML to CML means changing the procedure type to Compiled, then the mkcml utility
takes care of the compilation/link of the code, as you say it turns it into C (not C++) code. This can certainly give dramatic performance improvements but it's impossible to quantify without analysing your m&l. This may not be an overly difficult exercise.

PowerVB is linked to display objects. If you have a lot of power vb code and complex mimics, then it's possible that it is slowing down your display updates/refreshes. This is impossible to determince from this distance :) but it's a reasonable rule of thumb that if you are using powervb to do a lot of repetitive numeric processing on multiple on screen objects then it may be worth considering converting that code to math & logic and keeping power vb for strictly 'display object related activity' such as handling popups etc. This is only a suggestion - as I mention it is impossible to say without seeing the application and understanding how you use Pwer VB.

But just going back to your original point the IML and the Pwer VB are separate and looking at these is really 2 separate exercises.

>2) Migrate the FL ECS code to an exception read/write basis where >if a variable changes then the impacted variable is read/written
>instead of the current situation where we read/write the entire tag >dictionary. Means going from two routines to literally hundreds of
>sub routines.

Can you explain more of what you mean? What do you mean exactly by read/write the entire tag dictionary? Is this in iml, pvb, something else? Certainly if you have much repetitive code on thousands of tags and it only needs to be done when a value changes, then moving to an exception basis can produce a dramatic improvement. FactoryLink has many possible ways
to do this exception-type processing.

Going back to the overall picture, it seems that so far you have not really established whether the slowness comes from the application and it's design or from the comms. Before you
considering throwing out the comms and moving to Profibus, which as you say sounds expensive, I would suggest putting more effort into identifying where are the slow areas. It may be
relatively easy in your particular application to move to compiled M&L and your application will probably run much better as a result, so it may even be feasible to get your integrator to do that
conversion and see what it gains you. It is impossible to be specific - maybe your application does not actually have much M&L, but
if it is used heavily then you would expect to see a change by moving to compiled and it may be a very big change. The same applies to the PowerVB.

So the 64k$ question is what has been done so far to analyse the performance and cpu loading in order to ascertain where processor time is going? At this stage, as I said before, from your description you don't know whether it is the application or the comms that is most guilty (it may turn out to be both).

Sue Platt
[email protected]

www.spcs.com
+44 1732 872 229 tel
+44 1732 872 231 fax
SPCS, 16-18 New Road, Ditton, Aylesford, Kent, ME20 6AD, UK
 
J

Jeff Gazidis

(Originally posted 10/18/99)
Rick,
You have wonderful technology. My feeling though is that the issue with your system is not the processing power. From what I can understand from your description the issue might be in the communication methodology. Ethernet is a message based system based on colision detection. Your concern is to have events generated and processed in a determined amount of time. Therefore, your control strategy might be better server by a commnuication network that is deterministic (other buz word - producer/consumer). These types of communication schemes allow for the effective transmition of data in a predetermined time window. Your architecture relies on the operation of OS and network hardware to pass information. Hope this information gives you some other point of view on the issues.

Best of luck.
Jeff Gazidis
Engineer at large.
 
R

Ralph Mackiewicz

(Originally posted 10/19/99)
> Ethernet is a message based system based on
> colision detection. Your concern is to have events generated and
> processed in a determined amount of time. Therefore, your control
> strategy might be better server by a commnuication network that is
> deterministic (other buz word - producer/consumer). These types of
> communication schemes allow for the effective transmition of data in a
> predetermined time window.

Producer/Consumer (or publish/subscribe as it is called outside of the DeviceNet/ControlNet community) does not make a network "deterministic". Usually determinism comes from the use of a deterministic data link layer (layer 2 - the layer responsible for accessing the wire). Producer/Consumer services are a function of the application layer. There are numerous producer/consumer technologies that run over Ethernet.

I seriously doubt that the performance problems mentioned that prompted this response have anything at all do with Ethernet's media access control algorithm (i.e. CSMA/CD). See Mr. Woodward's response for some good suggestions as to how to find this problem.

Regards,
Ralph Mackiewicz
SISCO, Inc.
 
K
Friends...

Awhile ago (10/15/99)I sent a message to the Automation List detailing a problem I was having with system response time on a Factory Link HMI. Several of you replied either to me directly or through the list itself. I wish to thank each of you for responding. Your suggestions helped us to iron out the issues. I must apologize for not getting back to you all with the final outcome to this problem before now. I have been... shall we say... very busy.

Following the many avenues of investigation that you good people offered allowed us to track down the root issue. Darold Woodward suggested that
we throw a protocol analyzer across the system to allow us to view the actual packets moving around the system. The built in analyzer in HP-UX
was used for this purpose. What this showed us was that the Factory Link deamon was flooding the system with write packets to the PLCs.

<Simplified Explanation Switch>

FL issues a read request to the PLCs then process the data that is returned. This takes place in a loop construct, so that when the read/process cycle is finished it re-starts itself over again. When a write -> PLC request comes in from the Op station the FL daemon stops whatever it is doing... it then processes the write request then returns to its previously pre-empted task.

</Simplified Explanation Switch>

What was happening was an error in a FL routine was setting a bit false (causing the write request to execute). A different routine was seeing the bit go false then setting it true again (causing the write request to execute). This in turn tripped the first routine once again... and over and over and over... thereby preventing the proper system response from
showing. Once the offending routines were fixed... system response climbed to sub .5 of a second.

Once again... thank you.

Best Regards... Rick Kelly

Chief Technician
Natural Cuts
Cheese Operations
Kraft Canada
(613) 537-8069 V
(613) 537-8057 F
[email protected]
 
Top