Technical Article

Machine-readable vs. Human-readable Data

July 21, 2021 by Muhammad Asim Niazi

Industrial manufacturers use two main data types—machine-readable and human-readable—to analyze for maintenance, monitoring, or other applications.

Data is an essential asset for all organizations and forms the basis of many developmental plans. Analytics interpret information, and algorithm application depends on the correctness of data and the authenticity of the source. The representation of data into a correct format without destroying the original value is a critical part of the data management system. It is considered a vital part of data studies.

 

Figure 1. A representation of machine-readable and human-readable data. Image used courtesy of ResearchGate

 

There are two data formats: machine-readable and human-readable. Let’s dive into how their analysis and applications differ.

 

Machine-readable Data

The machine-readable format is designed for devices and machines. This format is complex for humans to understand. Specialized instruments are necessary to read the content of machine-readable data. Similarly, specialized devices are also necessary to generate machine-readable data. For the machine to read data, the information must follow the approved format understandable for the machines.

The data presented in the machine-readable format can be automatically extracted and used for further processing and analysis without human involvement. The machine-readable data is automatically shared using systems allowing automatic data feed.

The standard formats for machine-readable data include the following.

  • Comma-separated variables (CSV): The CSV is a standard format representing machine-readable data. The most common database software and systems, such as Microsoft Excel, provide this format. It stores data in tabular text form and can efficiently execute machine-to-machine transfer without human involvement. It has the extension *.CSV.
  • Resource description framework (RDF): RDF data is represented as web resources and allows for linking data sets in a format, a feature not provided in CSV format. It enables combining similar data from multiple resources to avoid duplicity and prevent data misses. 
  • JavaScript object notation (JSON): JSON is derived from JavaScript language and is simple to be read by computers and machines. JSON is used to serialize structured data, which means transforming the given data into a format understandable by the receiver end. It is an open format that is machine- and language-independent.

 

Figure 2. Sample JSON format. Image used courtesy of SAP

 

Human-readable Data

Human-readable data can be understood and interpreted by humans. The data interpretation does not require any specialized equipment or device, nor specialized equipment creates the data format. It has a natural language (e.g., English, French, etc.), and the data representation is unstructured.

An example of a human-readable format is a PDF document. Although PDF is a digital media, its data representation does not require any specialized equipment or computer to interpret. Moreover, the information contained in the PDF document is usually intended toward humans, not machines.

 

Difference Between Machine-readable and Human-readable Data

Following are some differences between machine-readable and human-readable formats.

 

Data Processing

Analytics can quickly process the machine-readable data. Machine-readable allows the algorithm to extract the data for further use as defined by the requirements. The data point identification and extraction depend on the machine-readable format, such as CSV. For example, pressure values indicated in machine-readable format can be combined and stored at a single location for further processing, such as comparing against other variables like time or different physical values.

 

Figure 3. A depiction of different human-readable data.

 

In human-readable format, the data or information cannot be extracted in a grouped form, nor can the data be further processed by computer algorithms or analytics. For example, humans can read, understand, and analyze the PDF file or an image, but this data format cannot be fed to analytics or algorithms.

 

Automatic Syndication

The machine-readable format can be easily shared via automatic syndication feeds. The user must only have an appropriate software application that accepts the syndicated feeds and the user’s request to receive the feed. These automatic feeds files are typically extensible markup language (XML) files or another supported machine-readable format.

Human readable, on the other hand, cannot be automatically syndicated. The whole document format is manually updated and requires manual access to read its content.

 

FAIR Data Principles

As mentioned above, the computer algorithm processes the machine-readable data, which records the data in standard formats. This makes the data compatible with FAIR data principles. FAIR data principles are:

  • Findable: The data is available and automatically discoverable.
  • Accessible: The user is authorized to use the data.
  • Interoperable: The data integrates into other data processing software or applications.
  • Reusable: The extracted data can be reused for various information or can be combined in different formats.

Human-readable data does not adhere to FAIR data principles as computer algorithms or analytics cannot process, identify, or analyze it. Human-readable data places different visual formats such as tables, texts, and numbers representing data for humans to understand. The human-readable data cannot be fed to any data management software, where the data is findable, accessible, interoperable, or reusable.

To conclude, machine-readable data cannot be easily read by humans. However, the opposite is also true. Human-readable data cannot easily be read by machines. Each form of data has different formats and use cases, especially in the industrial “big data” sector. Continue onto the next article about machine-readable versus human-readable data in fluid dynamics.