Technical Article

PygWalker Data Visualization Library - Part 2, Advanced Charting Display

September 12, 2023 by Michael Levanduski

PygWalker is a useful tool to visualize data, with advanced features to make the data more constructive. This tutorial continues using the milling machine predictive maintenance dataset.

In our first hands-on demonstration article about pygWalker, we viewed a simple scatter plot formation.

In this second article, we will explore more advanced visualization capabilities that pygWalker offers. This includes:

  • Data Tab
  • Bar Plot
  • Filtering
  • Sorting
  • New Chart
  • Coloring, Opacity, and Size Options
  • Multiple Plots

To see this demonstration in action, use the Google Colab notebook (created by the author) to experiment with the data visualization steps below.

 

Data Tab

The data tab offers a more user-friendly alternative to the pandas df.head() method in previewing datasets. It can be accessed by clicking the Data button, highlighted in yellow in the image below.

For all figures in this article, you can right-click and 'Open image in a new tab' to see the examples in greater detail.

 

Machine data in tabular columns

Figure 1. Data tab for previewing datasets. Author’s image

 

Notice that the columns are classified as either dimensions or measures. A dimension column in a dataset is a qualitative or categorical attribute that can be used to better classify or describe the data. A good example in the milling machine dataset is machine failure. The context of all other columns changes drastically if there is a machine failure (1), or if there is not a machine failure (0). Dimensions aid in extracting insights. Conversely, measures are quantitative or numeric data points that better serve to be aggregated in order to provide value.

 

Bar Plot

Bar plots are useful tools for discovering trends or comparing the measures of categorical data. For example, let’s see how the mean Tool wear [min] time stacks up across different product types.

To use the Google Collab notebook to make charts, click and drag an element from the left side to the X or Y-Axis on the right side. To resize the chart, change the layout mode (purple circle below) to Fixed.

 

Adding data series to the chart display

Figure 2. Adding data series and editing the chart space.

 

Initial chart display for machine data

Figure 3. Using bar plots to compare and trend data series. Author’s image

 

Not much of a difference. To drill down further, let’s add a filter.

 

Filtering

As engineers and technicians, we are likely curious about what this data looks like during a machine failure. Let's add Machine Failure to the Filter option and select “1” exclusively:

 

Filtering rule for data

Figure 4. Creating a rule (criteria) for filtering. Author’s image

 

This rule selects only the entries that contain a 1 in the machine failure column at the far right of Figure 1.

 

Chart with the data in a filtered format (unsorted)

Figure 5. Filtered dataset matching machine failure to tool wear. Author’s image

 

We can gather slightly more insight from this graph. Product type “L” had the highest Tool wear [min] time at the time of failure. We might use this data to update preventative maintenance schedules on the milling machine when running product type “L”.

 

Sorting

Another function of pygWalker is the ability to sort visualizations in descending or ascending order. Notice in our bar plot how the Product Type “L” had the tallest Tool Wear [min] time bar? We can sort the y-axis in descending order to move the product types in order of bar magnitude, using the highlighted button below:

 

Sort in descending order

Figure 6. Sorting icon on the toolbar. Author’s image

 

Using this sort function will arrange the data in ascending or descending orders to allow us to prioritize maintenance or attention to further studies.

 

Filtered and now sorted data display

Figure 7. Sorting the data to show the highest priority items at the top. Author’s image

 

Now the “L” product type is at the top of the bar graph with “H” and “M” following in order of decreasing magnitude.

As with most sort functions, we can select the ascending sort button to flip the direction of the sort.

 

Creating a New Chart

If we want to explore the data further without destroying the aforementioned bar plot, we can utilize the “+ New” option:

 

Making a new chart

Figure 8. Creating a new sorting tabulation. Author’s image

 

This will take us to a clean slate while preserving the work completed in the “Chart 1” tab.

 

Coloring, Opacity, and Size Options

These options add a visual “pop’ to the dashboard and can aid in visualizing complex insights within the dataset. To illustrate, let’s use a more complex example. In Chart 2, my goals are as follows:

  • Display Torque [Nm] on the x-axis
  • Display Tool Wear [min] on the y-axis
  • Filter by Machine Failure = 1
  • Color data points by Product Type
  • Remove the aggregate option to display actual row instances

The result is the following chart:

 

Chart with new data series (colored)

Figure 9. New chart with all display options. Author’s image

 

Colored data chart

Figure 10. The new chart, zoomed in showing only the data series in different colors. Author’s image

 

At a quick glance, we can determine some rough insights based on the requirements. The density of data points appears to be concentrated around the top right corner of the graph. This suggests that a higher frequency of machine failures occurred at both high Torque and Tool Wear measures.

Coloring by Type also shows that the distribution or number of “L” product types associated with machine failures is potentially greater than “M” or “H” product types. The caveat with this observation is to recognize that the product type column is biased towards “L” product types since 50% of all observations include the “L” product type. There could be a higher failure rate since the milling machine runs “L” product types more frequently based simply on volume. Isn’t it amazing how much can be interpreted from a visual created with the click and drag and drop of a few UI components?

Instead of using color to display the product Type, we can leverage opacity and size options of measures. The opacity measure yields a graph below:

 

Chart with varying opacity for each series

Figure 11. Changing the opacity of a data series. Author’s image

 

The opacity appears similar to the color option. However, it uses the same color just at different gradient levels to display different categories associated with the Torque [Nm] measures.

Lastly, the size option yields a graph below:

 

Chart with varying size for each series

Figure 12. Changing the size of a data series can be more visually engaging than opacity. Author’s image

 

It uses different size circles to display the different categories associated with the Torque [Nm] measures.

 

Multiple Plots

A final useful feature to note is the ability to create multiple plots within the same chart tab. We can accomplish this by simply dragging and dropping another field. For Example, let's add Rotation speed [rpm] to the x-axis of the previous size option chart:

 

Multiple plots for data comparison

Figure 13. Multiple plots, in this case, show tool wear as a function not only of torque but also RPM, which cannot be displayed on the same graph, due to the drastically different x-axis ranges. Author’s image

 

Notice that a second plot overlaps Tool Wear [min] with the additional option of Rotational speed [rpm]. You may create n number of plots in this manner (i.e. 3,4…).

 

Conclusion

PygWalker is a great, easy-to-use resource to get started with data analysis. It “feature copies” the leading business intelligence platform, Tableau, to deliver a familiar and intuitive experience. Use this guide and others on the web to try out pygWalker in Google Colab the next time you need to analyze an industrial dataset!