Increasing baud rate from 9600 to 19200 fixes issues...why?

I had a unit in the field that was having communication issues (timeouts) when the speed was set to 9600. This is a very simple rs485 network with just two devices. The master is set for very long timeouts (on the order of 5 seconds). I don't have any information regarding the timeout parameters of the slave, but I'm not sure it's applicable.

When the baud rate was set to 9600, I was getting timeouts on the master pretty frequently. In the past, this master has been working just fine with other devices at 9600 (not the same manufacturer). On a whim, I changed the baud rate to 19200 and magically, there were not any more timeouts anywhere. I moved it back to 9600, and the timeouts returned.

The messaging is not complicated whatsoever. And the spacing between messages from the master is very relaxed (as in the next message would go out 500mS after a good response is received.)

There also aren't any gaps in the outgoing message whatsoever. I don't have access to the unit in the field, but one here shows a nice clean outgoing waveform at both baud rates.

The cabling is good and termination present on both sides (run is only a few dozen feet, so no biggie anyway). The pullup/pulldown is also taking place on one side.

I'm at a loss to explain why increasing the baud rate would help this system? Anybody have any ideas?
 
Just to confirm, when you say you're changing the baud rate, you're changing it on both devices, correct?

What do you have the parity and number of stop bits set to for both devices? Make sure they match.

It's possible that your wiring may be the issue. I recommend starting by making sure you have connected 3 wires (+, -, and reference) between the devices, assuming the devices have a reference terminal (it may also be called common, COM, REF, ground, GND, SG, etc.), and that the + and - lines use a twisted pair. At least for an initial test, remove all termination and biasing (pullup/pulldown). With such a short cable run those are not necessary anyway.

Another good test would be to use a laptop, a USB to RS-485 converter, and a Modbus master simulator tool to communicate to the field unit and see if you can duplicate the issue you're seeing with the different baud rates. There are several Modbus master simulators available, but here are a few:

ModScan
https://www.win-tech.com/

Simply Modbus
https://www.simplymodbus.ca/

Modbus Poll
https://www.modbustools.com/modbus_poll.html
 
Just to confirm, when you say you're changing the baud rate, you're changing it on both devices, correct?
Yup...at 9600 on both sides (both sides are always set the same), some responses occur correctly (maybe 30%). At 19200 on both sides, there are no errors whatsoever.
What do you have the parity and number of stop bits set to for both devices? Make sure they match.
Yup, 8-E-1 on both.
It's possible that your wiring may be the issue. I recommend starting by making sure you have connected 3 wires (+, -, and reference) between the devices, assuming the devices have a reference terminal (it may also be called common, COM, REF, ground, GND, SG, etc.), and that the + and - lines use a twisted pair.
Yup, there is even a drain connected to ground on one side only.
At least for an initial test, remove all termination and biasing (pullup/pulldown). With such a short cable run those are not necessary anyway.
Another good test would be to use a laptop, a USB to RS-485 converter, and a Modbus master simulator tool to communicate to the field unit and see if you can duplicate the issue you're seeing with the different baud rates.
I sure with I could those things. Can't really, as this unit is in the field and "they" won't allow me to muck with it physically.

It's in the "it works, don't mess with it now" category. But I don't want this to happen in the future or at least would like to have some sort of theory of why it happens.

Thanks for your input!
 
Without being able to perform any testing to see what impacts the communication, it's going to be nearly impossible to determine exactly what is causing the communication problems. Typically when there are wiring/grounding/electrical issues, you see problems occurring the other way around - when the baud rate is too fast there are issues, but slowing the baud rate down fixes it.

The only thing I can think of where communications works better at faster baud rates is when the master's timeout setting is set too low. Since the higher baud rate results in the responses being transmitted faster, there are no more timeouts. Are you able to confirm that the master's timeout is actually set to what you think it's set to? If the master waits 500ms after successful reception of a response, is there 5.5s between transmissions when there is a timeout with the master's timeout setting set to 5s?
 
Ya, I've seen that too where slowing down helps. That's what so interesting about this one.

And yes, I do confirm the master times out 5.5 seconds and sends a retry. And I also confirmed that absolutely no response comes back (even partial). So for some reason, the slave device doesn't like the incoming packet.

With an 8 byte packet going out at 9600, it should take around 8mS. At 19200, it would take 4mS (all sloppy math and not taking into account any type of delay between characters which hopefully doesn't exist). So that slave device would really just get an extra 4mS to do something when changing to 19200, but I can't see how that helps it "like" the incoming packet? Maybe it gets more time then to calculate the crc or something? When things do work at 19200, I don't have any data about how fast the response packet comes back. All I know is it comes back within 5 seconds (an eternity).
 
The master's timeout setting is for how long it will wait for the slave's response. When reading a single register, the difference is negligible (7-byte response packet so 7ms versus 3.5ms), but when reading a lot of registers, say 100, this is where the timeout setting becomes important (assuming the master's timeout is based on receiving the whole packet, not just the first character). For reading 100 registers, it would be a 205-byte response packet, so it would be 205ms versus 102.5ms for the different baud rates.

But it sounds like that can't be the cause of the issue you're seeing because the slave isn't even responding. How did you confirm this? Did you use an oscilloscope or a network analyzer, or are you trusting the master device reporting that it didn't receive a response?

There are a couple other timing issues that can be encountered that are baud rate dependent. Although they are fairly rare, they have to do with how long the devices drive the RS-485 bus after transmitting. In both scenarios, though, you would see a pattern such that an error occurs on every other packet. Here are the two scenarios:
  1. The master drives the bus too long. In this scenario, the master may be driving the bus when the slave begins transmitting its response, resulting in a collision on the bus and only a partial, corrupted response packet being received by the master. This may be reported by the master as a timeout, or possibly a CRC error, or some other packet error.
  2. The slave drives the bus too long. In this scenario, the slave may be driving the bus after sending its response, corrupting the master's next request. The slave then receives a partial, corrupted request packet and does not send a reply (as it should, since the CRC would not be correct).

Increasing the baud rate would reduce the amount of time the device drives the RS-485 bus, and may resolve the above issues. Alternatively, some Modbus devices have settings that you can configure an artificial request/response delay to overcome this issue. Since your master waits about 500ms after receiving a response to send the next request, the second scenario above would not be the cause of the issue you're seeing.

Are you able to discern any pattern as to when the timeouts occur or are they seemingly random?
 
The master's timeout setting is for how long it will wait for the slave's response. When reading a single register, the difference is negligible (7-byte response packet so 7ms versus 3.5ms), but when reading a lot of registers, say 100, this is where the timeout setting becomes important (assuming the master's timeout is based on receiving the whole packet, not just the first character). For reading 100 registers, it would be a 205-byte response packet, so it would be 205ms versus 102.5ms for the different baud rates.

But it sounds like that can't be the cause of the issue you're seeing because the slave isn't even responding. How did you confirm this? Did you use an oscilloscope or a network analyzer, or are you trusting the master device reporting that it didn't receive a response?
Trusting the master device reporting...it has some limited statistics, # of outgoing packets, # of incoming, # timeouts, # crc errors, # len errors. In the 9600 baud scenario, # timeouts increments and all others are 0. A partial response would have given me a few crc errors. There are also TX/RX leds and the RX never blinks.

There are a couple other timing issues that can be encountered that are baud rate dependent. Although they are fairly rare, they have to do with how long the devices drive the RS-485 bus after transmitting. In both scenarios, though, you would see a pattern such that an error occurs on every other packet. Here are the two scenarios:
  1. The master drives the bus too long. In this scenario, the master may be driving the bus when the slave begins transmitting its response, resulting in a collision on the bus and only a partial, corrupted response packet being received by the master. This may be reported by the master as a timeout, or possibly a CRC error, or some other packet error.
  2. The slave drives the bus too long. In this scenario, the slave may be driving the bus after sending its response, corrupting the master's next request. The slave then receives a partial, corrupted request packet and does not send a reply (as it should, since the CRC would not be correct).

Increasing the baud rate would reduce the amount of time the device drives the RS-485 bus, and may resolve the above issues. Alternatively, some Modbus devices have settings that you can configure an artificial request/response delay to overcome this issue. Since your master waits about 500ms after receiving a response to send the next request, the second scenario above would not be the cause of the issue you're seeing.
I do know the master only drives the bus for about 1mS after the completed packet goes out. The master also only has a small 120uS delay between initial driving of the bus and the first outgoing character. The idle state is high also. I'm able to see this in the lab.
If the slave drives the bus too long, I'd still see some good packets coming back from the slave. I would hope if it did that it would at least have the courtesy to have the line at idle so I wouldn't detect additional incoming characters on the master side. Even if it didn't, I'd still get something other than timeout on the stats side because of the master timeout length. And if that device is holding the transmitter for an extra 250 mS, it should probably be shot ;-)
Are you able to discern any pattern as to when the timeouts occur or are they seemingly random?
No pattern...but I can't get on that site anyway. I've actually been able to change the system to only ask for one holding register to get some other variables out of the equation. No change to the behavior.
Quite puzzling...I really need to get some sort of analyzer on that link in the field, but I'm SOL there.
Thanks again for your comments!
 
Top