PygWalker Data Visualization Library - Part 2, Advanced Charting Display
PygWalker is a useful tool to visualize data, with advanced features to make the data more constructive. This tutorial continues using the milling machine predictive maintenance dataset.
In our first hands-on demonstration article about pygWalker, we viewed a simple scatter plot formation.
In this second article, we will explore more advanced visualization capabilities that pygWalker offers. This includes:
- Data Tab
- Bar Plot
- New Chart
- Coloring, Opacity, and Size Options
- Multiple Plots
To see this demonstration in action, use the Google Colab notebook (created by the author) to experiment with the data visualization steps below.
The data tab offers a more user-friendly alternative to the pandas df.head() method in previewing datasets. It can be accessed by clicking the Data button, highlighted in yellow in the image below.
For all figures in this article, you can right-click and 'Open image in a new tab' to see the examples in greater detail.
Figure 1. Data tab for previewing datasets. Author’s image
Notice that the columns are classified as either dimensions or measures. A dimension column in a dataset is a qualitative or categorical attribute that can be used to better classify or describe the data. A good example in the milling machine dataset is machine failure. The context of all other columns changes drastically if there is a machine failure (1), or if there is not a machine failure (0). Dimensions aid in extracting insights. Conversely, measures are quantitative or numeric data points that better serve to be aggregated in order to provide value.
Bar plots are useful tools for discovering trends or comparing the measures of categorical data. For example, let’s see how the mean Tool wear [min] time stacks up across different product types.
To use the Google Collab notebook to make charts, click and drag an element from the left side to the X or Y-Axis on the right side. To resize the chart, change the layout mode (purple circle below) to Fixed.
Figure 2. Adding data series and editing the chart space.
Figure 3. Using bar plots to compare and trend data series. Author’s image
Not much of a difference. To drill down further, let’s add a filter.
As engineers and technicians, we are likely curious about what this data looks like during a machine failure. Let's add Machine Failure to the Filter option and select “1” exclusively:
Figure 4. Creating a rule (criteria) for filtering. Author’s image
This rule selects only the entries that contain a 1 in the machine failure column at the far right of Figure 1.
Figure 5. Filtered dataset matching machine failure to tool wear. Author’s image
We can gather slightly more insight from this graph. Product type “L” had the highest Tool wear [min] time at the time of failure. We might use this data to update preventative maintenance schedules on the milling machine when running product type “L”.
Another function of pygWalker is the ability to sort visualizations in descending or ascending order. Notice in our bar plot how the Product Type “L” had the tallest Tool Wear [min] time bar? We can sort the y-axis in descending order to move the product types in order of bar magnitude, using the highlighted button below:
Figure 6. Sorting icon on the toolbar. Author’s image
Using this sort function will arrange the data in ascending or descending orders to allow us to prioritize maintenance or attention to further studies.
Figure 7. Sorting the data to show the highest priority items at the top. Author’s image
Now the “L” product type is at the top of the bar graph with “H” and “M” following in order of decreasing magnitude.
As with most sort functions, we can select the ascending sort button to flip the direction of the sort.
Creating a New Chart
If we want to explore the data further without destroying the aforementioned bar plot, we can utilize the “+ New” option:
Figure 8. Creating a new sorting tabulation. Author’s image
This will take us to a clean slate while preserving the work completed in the “Chart 1” tab.
Coloring, Opacity, and Size Options
These options add a visual “pop’ to the dashboard and can aid in visualizing complex insights within the dataset. To illustrate, let’s use a more complex example. In Chart 2, my goals are as follows:
- Display Torque [Nm] on the x-axis
- Display Tool Wear [min] on the y-axis
- Filter by Machine Failure = 1
- Color data points by Product Type
- Remove the aggregate option to display actual row instances
The result is the following chart:
Figure 9. New chart with all display options. Author’s image
Figure 10. The new chart, zoomed in showing only the data series in different colors. Author’s image
At a quick glance, we can determine some rough insights based on the requirements. The density of data points appears to be concentrated around the top right corner of the graph. This suggests that a higher frequency of machine failures occurred at both high Torque and Tool Wear measures.
Coloring by Type also shows that the distribution or number of “L” product types associated with machine failures is potentially greater than “M” or “H” product types. The caveat with this observation is to recognize that the product type column is biased towards “L” product types since 50% of all observations include the “L” product type. There could be a higher failure rate since the milling machine runs “L” product types more frequently based simply on volume. Isn’t it amazing how much can be interpreted from a visual created with the click and drag and drop of a few UI components?
Instead of using color to display the product Type, we can leverage opacity and size options of measures. The opacity measure yields a graph below:
Figure 11. Changing the opacity of a data series. Author’s image
The opacity appears similar to the color option. However, it uses the same color just at different gradient levels to display different categories associated with the Torque [Nm] measures.
Lastly, the size option yields a graph below:
Figure 12. Changing the size of a data series can be more visually engaging than opacity. Author’s image
It uses different size circles to display the different categories associated with the Torque [Nm] measures.
A final useful feature to note is the ability to create multiple plots within the same chart tab. We can accomplish this by simply dragging and dropping another field. For Example, let's add Rotation speed [rpm] to the x-axis of the previous size option chart:
Figure 13. Multiple plots, in this case, show tool wear as a function not only of torque but also RPM, which cannot be displayed on the same graph, due to the drastically different x-axis ranges. Author’s image
Notice that a second plot overlaps Tool Wear [min] with the additional option of Rotational speed [rpm]. You may create n number of plots in this manner (i.e. 3,4…).
PygWalker is a great, easy-to-use resource to get started with data analysis. It “feature copies” the leading business intelligence platform, Tableau, to deliver a familiar and intuitive experience. Use this guide and others on the web to try out pygWalker in Google Colab the next time you need to analyze an industrial dataset!