Technical Article

Understanding XML and Data Exchange Formats for Industrial Settings

August 17, 2023 by Michael Levanduski

The basic component of data exchange between devices, machines, and embedded systems serves as a hallmark foundation in every organization’s Industry 4.0 implementation process.

Where does a data exchange format fit in Industry 4.0 exactly? Why should we care about how data flows through the industrial control network? These are some questions that may come to mind when reading about such a simple component of the modern Industrial IoT (IIoT) space.

 

What is Data Exchange?

The data exchange component of an automation system is critical to professionals that will build and ultimately maintain Industry 4.0 solutions. The data exchange format falls in the middle of the larger Industry 4.0 scope:

  • Architectural Pattern - The strategy that defines how industrial systems will be designed and how components (machines, sensors, etc) will interact with other components. This component is essential to create a modular and maintainable solution.
  • Data Exchange Format - The universal structure that a component from one system will create to communicate data effectively with another system.
  • Communication Protocol - Facilitates the transfer of data from one system to another in a universally accepted methodology.

A good example of the larger scope is IIoT leveraging MQTT-enabled edge devices. The MQTT communication protocol leverages the javascript object notation (JSON) data format to move data in an event-driven, publish-subscribe architectural pattern. Let’s dive into one of the original formats for data exchange, XML.

 

Data transfer in a facility

Figure 1. The interconnected nature of modern manufacturing requires a strong understanding of data exchange formats. Image used courtesy of Pilz GmbH & Co. KG

 

Origin of Extensible Markup Language (XML)

Extensible Markup Language was initially developed in the 1990s, a child of Standardized Generalized Markup Language (SGML). SGML is considered the parent of many markup languages, including HTML, and was developed in the 1980s.

SGML found popularity primarily in government and aerospace applications since it could conform to the complex structures of the industry's documentation and data format requirements. Conversely, HTML gained popularity due to its simplicity in having a defined set of simple tags or elements that many industries could adopt. However, XML and HTML serve different purposes.

XML was developed as a simpler implementation of SGML with the goal of storing and transporting data in an agreed-upon structure. Alone, an XML file will not trigger any action; it is not capable of doing any work. Organizations and users must define scripts or applications to compile, send, receive, and parse the XML file. Thus, an XML file is simply a store or package of data.

 

XML, the extensible markup language

Figure 2. XML is a primary language used for data exchange. Image used courtesy of Adobe Stock

 

Differences Between XML & HTML

Although both languages were children of SGML, there are a few key differences to note. Primarily, the paradigm of XML is to move data, whereas HTML is concerned with the presentation of data to a user. Furthermore, tags in HTML are predefined. Developers across the world must conform to using <head></head> and <body></body> tags, for example. In contrast, tags in XML are custom defined at the user or organization level to suit business needs. This highly flexible nature of XML is where the term extensible is derived from.

 

XML Declaration

The declaration tag is optional and is best used to provide metadata about the file. It is placed at the beginning of the file. A very common implementation can be found below, where the XML version and Unicode text standard are announced:

<?xml version="1.0" encoding="UTF-8"?>

 

XML Root Element

The document is defined by a start <root> tag and an end </root> tag. Everything in between the start and end root tags are considered children XML elements to the parent root element.

<?xml version="1.0" encoding="UTF-8"?>
<machine>

</machine>

 

XML Child Elements

An element is considered the start tag, anything in between, and the end tag. Elements can contain single or multiple permutations of text, attributes, or other elements. For example:

<?xml version="1.0" encoding="UTF-8"?>
<machine>
    <make>Nissei</make>
    <model>NEX100</model>
    <serialNumber>123456789</serialNumber>
    <status>Running</status>
    <parameters>
        <parameter name="Pressure">
          <value>3000</value>
          <unit>psi</unit>
        </parameter>
        <parameter name="Temperature">
          <value>400</value>
          <unit>fahrenheit</unit>
        </parameter>
      </parameters>
</machine>

 

The <parameters></parameters> element contains multiple sub-children elements. Each of the <parameter></parameter> sub-child elements contains an attribute name that describes the parameter in greater detail. The <value></value> and <unit></unit> elements contain text in between the start <parameter> and end </parameter> tag that provides information as to the quantitative and qualitative content of the <parameter></parameter> element.

 

Validation of XML Files

Given the highly flexible nature of XML, validation of XML files from one system to another is required to ensure that data is structured correctly in the file. This is accomplished through the use of an XML Schema Definition file or XSD. An example of such a file for the above XML implementation could be:

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="machine">
    <xs:complexType>
      <xs:sequence>
        <xs:element type="xs:string" name="make"/>
        <xs:element type="xs:string" name="model"/>
        <xs:element type="xs:int" name="serialNumber"/>
        <xs:element type="xs:string" name="status"/>
        <xs:element name="parameters">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="parameter" maxOccurs="unbounded" minOccurs="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element type="xs:short" name="value"/>
                    <xs:element type="xs:string" name="unit"/>
                  </xs:sequence>
                  <xs:attribute type="xs:string" name="name" use="optional"/>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

 

The third line, <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">, which you can visit to see in detail, defines the XML schema and references components from the URI namespace maintained by the W3C or World Wide Web Consortium.

The <xs:element> reference XML elements, and the <xs:attribute> defines the parameter attribute associated with the parameter elements. Another notable component is the <xs:complexType> which indicates that an element contains other elements or sub-children.

 

XML Summary

XML advantages include readability, customization, the ability to represent hierarchical structures, and validation via XSD. However, due to the complexity that arises from the extensibility, parsing the XML can be resource-intensive on both humans and machine CPUs. XML is still used widely in industry, however, Javascript Object Notation (JSON) is preferred when possible.