Modbus/TCP listen() backlog and RST

I

Thread Starter

Ian Goldby

If a Modbus/TCP client cannot get an immediate connection to the server, should it be left waiting, or should it get an immediate notification from the server?

There seems to be some disagreement between Microsoft and (most of) the rest of the world as to what should happen when the server's listen() backlog is full. The Microsoft TCP/IP stack sends back a RST, which is the same response the client would get if there were no listening socket. Other TCP/IP stacks send back nothing (as if the request had gone into a black hole) and the client will then repeat its initial SYN a few times and finally time out if it continues to get no response. This usually takes several seconds.

In the former case the client knows immediately that it cannot be serviced, but may consider this to be a permanent error.

In the latter case the client may hang until either the connect attempt times out or another client disconnects and the server now has free resources to service the new client.

Added to this is the complication of the backlog itself. If a connection backlog is allowed (and this again is TCP/IP stack-dependent; some stacks don't let you set the backlog to 0) then the client may think it has negotiated a successful connection but when it sends its first Modbus message it will be ignored, causing another wait and possible timeout.

From the point of view of the client, would you rather get an error back immediately if you cannot be serviced immediately, or would you rather be ignored and left to try again?
 
This is a perfect example why Modbus/TCP was not intended for use in any emergency stop or safety application where delivery time is critical. M/TCP was intended for non-critical data transfer and it works very well. If you need to transfer critical data, Modbus/RTU is better when implemented on EIA-485 cabling. Modbus/UDP has not been standardized, but the UDP delivery mechanism has no defined delays. M/UDP also has no defined error checking either, however, once the delivery protocol is defined for both client and server, it can be highly deterministic and fast. EtherNet/IP uses TCP only for uploads, downloads, and setup, but uses UDP for all data transfers.

Dick Caro
===========================================
Richard H. Caro, Certified Automation Professional, CEO, CMC Associates,
2 Beth Circle, Acton, MA 01720 USA
E-mail: RCaro [at] CMC.us
Web: http://www.CMC.us
Buy my books:
http://www.isa.org/books
Automation Network Selection
Wireless Networks for Industrial Automation
http://www.spitzerandboyes.com/Product/fbus.htm
The Consumer's Guide to Fieldbus Network Equipment for Process Control
Buy this book and save 50% or more on your next control system!!!
===========================================
 
The reason there is a difference between how Microsoft handles connection limits and how everyone else handles connection limits is that those limits mean two different things. For most systems a connection limit normally represents an actual physical resource limit. You are running out of memory, CPU, response time, or some other genuine limit. In that case, ignoring additional connection attempts is the only logical response because you may not be *able* to send any sort of response.

For MS Windows however, a connection limit is a like a tag limit on a SCADA system. When you reach a limit it means "put another coin in the meter" to get your limit raised. The point was to try to get you to upgrade to a more expensive server version if you needed to run more simultaneous connections. If you reached a connection limit then there were still plenty of resources left to send an RST because it is a marketing and sales strategy limit, not a technical limit.

From the point of view of the client, you would have to look at the environment it is operating in. If you are trying to establish a TCP connection, then that means you don't already have one. That is, you are beginning a process whereby you will communicate; you're not in the middle of an established connection.

If you are trying to establish a connection and you are refused the first time, what is the normal thing that most people would want to do? They would want to retry of course and hope that the problem has gone away. In fact, they would want to keep retrying until some time-out was reached before giving up and declaring that an unrecoverable error was reached. If you defeat the built-in retry mechanisms in TCP/IP that would just mean that users would have to add them back in at the application level.

Also, once you *have* established a connection, the server cannot guaranty that it will have the ability to respond to you. Under normal operation this shouldn't happen, but since we are talking about servers operating at the very limits of their capacity, we are talking about how things should fail gracefully rather than collapse suddenly.
 
In response to Dick Caro: This example says nothing about Modbus/TCP's suitability for safety applications. We are talking about what happens when a network communications attempt fails. *Any* network can fail, whether it uses TCP, UDP, Ethernet, RS-485, or two tin cans and a piece of string. A "safety system" that can't operate correctly in the face of a network failure isn't a safety system.

As for AB Ethernet/IP's use of UDP, I believe that has more to do with keeping its behaviour consistent with DeviceNet which is also a datagram type protocol. They have a lot of legacy protocols to support and UDP probably offered the easiest transition path for them to Ethernet.

Actual network performance will depend more on implementation details than on any inherent characteristics of TCP or UDP. I have benchmarked TCP based protocols which were several times faster than similar UDP based protocols running on the same hardware at both ends.

For off the shelf hardware, the real limits will normally be in the CPU power and software efficiency of the field devices. There are very large variations in those even between different pieces of hardware which use the same protocol.
 
I
I like your reasoning - why try to defeat the built-in retry mechanism of TCP? I think I'll go with that. Thanks for the advice.
 
L

Lynn August Linse

Treating this in a different vain.

There is some value in a server delaying the RST - in fact this was one of the "magic tweaks" which Modicon put in the early NOE. Since each client only connected for a short time (technically, one translation), then delaying the rejection of a new client a few hundred milliseconds meant a socket would likely open up.

The other oddity is with cellular and other HIGH LATENCY media. It is easy to get in a spitting-match where a client connects, but the response is so slow that the client aborts and retries. Our own tests show when a cellular network has problems, it is easy to create up to 20 dead-zombie-like sockets in a few minutes. Literally the server is accepting the sockets, but the client keeps either aborting the socket or reacting badly to a stale [RST] when the server ran out of resources a few moments ago. Normal TCP keep-alive doesn't help because it doesn't kick into function for several minutes (2 hours per the RFC recommendations).

For cellular support, we have been forced to add a user-selectable tweak to our IP server/listener code which in effect says "Our clients only hold one connection open, so a second connection from the same IP client means the first should be instantly dropped without abort/rst". Of course this only works if the client are NOT going through a NAT firewall, which means multiple clients will use the same firewall IP as source.
 
Top