Technical Article

Methods and Algorithms in Error Checking for Serial Communications

June 10, 2023 by Damond Goodwin

Errors in communications often manifest as a result of electrical noise interrupting the data during the transmission process, leading to various methods for determining whether received data has been damaged.

Why Check For Errors in Serial Communications?

Data transmission over multiple network channels can create a source of errors in information sent through the networks. Error checking is a useful way to locate damaged data and prevent some of the problems associated with it. There are several different methods for determining whether received data has been damaged or not. These methods generally consist of some form of redundant information or a calculation to be completed on both the sending and receiving end of the serial transmission for data quality analysis.

Errors in communications often manifest as a result of electrical noise interrupting the data during the transmission process. Electrical noise corrupts the bits as they travel between sending and receiving units. As a result, the noise interferes with the binary signal, flipping bits, turning 1’s into 0’s and 0’s into 1’s. The errors can be found using different types of error-detecting code. These detection codes vary in complexity from the very simple parity check to more complex codes like the Cyclic Redundancy Check (CRC) and other methods of error detection.

 

Ethernet serial communication

Figure 1. Network communications are always prone to errors in industrial facilities. We can minimize it, but never eliminate it. Image used courtesy of Adobe Stock

 

Parity Bit Checks

Parity checking for errors is the simplest form of error detection, consisting of the addition of a single bit added to the end of the transmitted data for the purpose of error detection.

The parity bit is usually placed within a given byte of data. So the byte consists of 7 bits that carry data and a single parity bit, either even or odd, that is used to check the status of the data. Parity error checking can only be used to detect data corruption, it is not able to correct it.

Parity will work only if there is an odd number of errors, since an even number of errors will cancel out any change in the parity of the bit summation. But this may be acceptable since within a single byte, the chance of one single error is quite low; the chance of two errors is exponentially lower.

There are two different parity checks that can be performed, either an even parity check or an odd parity check. Only the 1s that are present in the data set are counted, with the parity bit itself being included in the check.

For example, in an even parity check, a sent byte of data consists of 10101010: that parity bit being the final 0 at the end of the byte. In an even parity check, the total number of 1s in the byte must be even for the data to pass the check on the receiving end. In our example, there are four ones, so the final parity bit is a zero, making the sum of the 1s in the byte equal to 4, which is an even number.

Using the same byte example in an odd parity check, the receiving device of the transmission would conclude that the byte had been corrupted. This is because the final bit in 10101010 makes the number of 1s in the byte even, yielding a ‘false’ in the case of an odd parity check. For systems using an odd parity check, the same data 1010101 should be sent with a 1 as the final bit, making the number of 1s in the transmission odd or ‘true’ in the case of this particular data transmission. With the same data, a correct odd parity transmission would look like 10101011.

 

Motor drive units

Figure 2. Motors and other inductive devices throughout the industry can tamper with data integrity. Image used courtesy of Adobe Stock

 

Checksum

A checksum error check is more complex than a parity check, but has the benefit of a higher degree of precision, with less chance of a missed error. A checksum works by sending a ‘checksum’ to accompany the sent data.

The sending unit creates a checksum by adding the bits using 1s complement arithmetic. In 1s complement arithmetic, any time there is a carry-over digit, it is added to the sum, meaning the length of the sent data sequence remains the same.

All of the data is added to get a sum, this sum is then complemented to create a checksum. In boolean terms, ‘complement’ means that all the 1s are changed to 0s and all the 0s are changed to 1s.

The checksum is sent with the data to be accepted by the receiver. The receiver then takes the data and performs a summation of all received data. The received data summation is then added to the checksum, and a final complement of this last summation is performed. If there are no errors in the data, then the resulting summation’s complement will always be 0.

Checksum Example

Sent data #1 = 11001001

Sent data #2 = 00010110

Sent Sum = 11011111

Checksum = 00100000

 

Received data #1 = 11001001

Received data #2 = 00010110

Checksum = 00100000

Received sum = 11111111

Received sum complement = 00000000

 

In the example, the complement of all received data plus the checksum is zero, so the receiver can conclude that the data is uncorrupted.

 

Network Cables

Figure 3. All network types are susceptible to inductively generated interference; the question is, how can we detect problems? Image used courtesy of Adobe Stock

 

Cyclic Redundancy Check (CRC)

Cyclic Redundancy Check (CRC) is another form of error detection that relies on polynomial division to check not only if the correct number of 1’s and 0’s are present but whether their order is also correct. Essentially the data to be checked is treated like a polynomial equation, then divided by a chosen divisor, the value of the divisor itself is important to the accuracy and usefulness of the check.

The binary information has to be converted into a polynomial equation in order to fit into the check method. In order to do this, the binary numbers become the coefficients in a polynomial expression, the length of which exactly matches the length of the code to be divided. An example of how the process works is shown below. Polynomial conversion is useful because it creates a way of dividing without carrying numbers or borrowing in binary math, as subtraction requires. This creates the XOR division method that is used for CRC redundancy checks.

 

Example of CRC Calculation

Let's assume the binary number 101101 as our string of data bits and use individual binary bits as coefficients in a polynomial, set up for polynomial division. It would look something like this:

$$x^5+0x^4+x^3+x^2+0x+1$$

Computers make use of coefficients for polynomial division in a system of mathematics known as linear algebra, which is very common in computing algorithms.

Next, we would need to divide it by another polynomial or ‘key’; let us use 1110. The resulting key would look something like this:

$$x^3+x^2+x+0$$

Next, the division is performed and the remainder is found. The remainder is attached to the data and used in the redundancy check. If the received data is divided by the key, and then the difference between the received remainder and the sent remainder is zero, the data is not corrupt. If there is a remainder, data corruption has occurred.

 

Sender Calculation

Using our previously assumed example data, the sent information would be calculated something like this:

Actual Data: 101101

Dividend: 101101000. This value results from a combination of the actual data with [(number of bits in the key) - 1] number of 0s attached at the end. Our key (below) contains 4 bits, so the data needs [4-1] or three 0s added to it for the calculation.

Divisor (key): 1110

Now, the calculation can occur to determine the code word:

101101000 ÷ 1110 = 111101, using XOR division, with a remainder of 110.

So the coded word to be sent would become 101101110, because the remainder is placed in the final three spaces on the coded word.

So the following would be sent to the receiver:

Actual data: 101101

Key: 1110

Code word: 101101110

 

Receiver Calculation

The receiver would then take this information and divide the code word by the key, effectively the opposite of the previous process.

The remainder should be zero and the actual data can be trusted, unless the data has been corrupted, in which case a non-zero remainder will appear.

101101110 ÷ 1110 = 111101, using XOR division, with a remainder of 0.

 

Combat Communication Errors

Error checks are an important part of data transmission and help to prevent incorrect data from being transmitted. There are many different ways that data can be corrupted, but in the end, it boils down to misplaced or switched 1s and 0s. Simple parity checks and checksums help to determine if the correct number of 1s and 0s arrive, but certain mistakes can’t be found as a result of their simplicity.

More complex error-checking methods such as the Cyclic Redundancy Check, and others like Longitudinal Redundancy Check (LRC) can aid in managing errors in a more thorough fashion.