Technical Article

Data Model and Functionality of Utility Data Warehouse | UDW Part 1

January 22, 2024 by Munir Ahmad

Storing data long-term for recording and reporting purposes is the job of the utility data warehouse, quite different from data-collection processes that optimize systems and predict failures.

The UDW, or utility data warehouse, is a system for storing time series, historical, and future data. The UDW server's main functionality is storing data in a relational database management system (RDBMS) for use by various other client applications. The diagram below shows possible operation functions for industrial applications.

 

UDW function diagram

Figure 1. Presentation, browsing, and updating are functions for viewing and updating data stored in the UDW

 

Tabular and graphical presentation, browsing, and Excel-based reports are the functions for viewing and updating data stored in the UDW.

Modern SCADA systems can store historical data for creating reports like hourly, daily, weekly, and annual reports. All power utilities need to record information for reporting purposes to measure the performance of the power system operations. However, many industrial control systems are usually not designed and equipped with analysis tools, so if we integrate the UDW functionality (especially reporting) in the online SCADA system, it will load the system unnecessarily.

The UDW system is connected to some external systems, typically a SCADA/EMS/DMS. The data source is a function that records data into the UDW and further analysis and data mining are functions for refining and processing that data. Analysis and data mining software normally require the creation of queries against the UDW database; thus, the knowledge and understanding of the UDW data model are prerequisites.

 

UDW Data Model and Basic Concepts

The utility data warehouse, UDW, provides a comprehensive set of functions for processing all types of historical information within a power generation utility. Real-time data is sampled from databases like Oracle, Microsoft SQL Server, and PostgreSQL and stored in the UDW database with the results from the calculations applied to the sampled information. To understand the UDW, it is imperative to understand the following key concepts and terms.

 

Data Object Type

In an object-oriented language, a type is a class with properties or attributes and relations. In relational language, a data object type is a table with columns that have relations to other tables using foreign keys. The foreign key is a column referencing a row in any table. Some examples of data object types from the SCADA system real-time database are measurand, indication, and station.

The examples of columns or properties for the data object types are

MEASURAND (CURRENT VALUE): which is the real-time current value

MEASURAND (EXTERNAL IDENTITY): the name and unique identification

MEASURAND (UNACK ALARM): Unacknowledged alarm exists

MEASURAND (STATION IREF): Reference to the station describing the concept of foreign.

MEASURAND (ACTUAL STATUS): The indication status is Open/close, Yes/No

 

Data Object

The data object is an entity, object, or instance, and in the database language, it is a row or set of elements in a table. Objects can be related to power system objects in the SCADA system, e.g. generator (GENERATOR), transformer (TRAN_2W or TRAN_3W), or breaker (SWITCH).

Each data object has a type that describes it, and values are assigned to the columns described by the type, e.g.

STATION (STATION ID) = TRBL

MEASURAND (EXTERNAL IDENTITY) = TRBL220_Unit1_MW

MEASURAND (VALUE CURRENT) = 140

MEASURAND (Engr. Unit) = MW

MEASURAND (STATION IREF) = the record number to station TRBL

 

Time Series

The time series is a sequence of time-stamped and probably with quality-coded values. The sequence is related to the data object and contains the historical values for the object. Any columns or attributes for a data object type can be recorded as a time series. Access to time series data supports common functions like retrieval, insert, update, and deletion.

 

Lifetime and Modification

Most objects in the UDW have a lifespan and are created, modified many times, and possibly terminated at the end. Object properties or attributes may be modified, e.g. if the object name is changed, this change is kept tracked in the UDW. This means that getting the object name at a time before the modification shall return the old name, not the new. Each modified object is stored as a separate row in the UDW tables.

 

Technician monitoring a display and control panel

Figure 2. Monitoring, tracking, and reporting data is the main purpose of the UDW. Image used courtesy of Adobe Stock

 

Methods of Data Collection and Sampling

The UDW server stores data originating from a SCADA/EMS/DMS system and it has the feature to store data from other sources as well. The multiple types of data can be sampled from SCADA/EMS/DMS system some examples are given below:

  • Data including telemetered digital, analog, and calculated values
  • Power application values, set points, unit response, energy deviation, etc.
  • System node's status and health, availability, communication, the RTUs, and data links availability
  • Power system events and alarms

The SCADA/EMS data is usually sampled from the primary real-time database and then inserted into the UDW server. The highest sampling frequency is limited to 1 second however, different sampling frequencies can be specified. The data sampling is done in the following different ways.

 

Continuous Sampling

The value is sampled from SCADA/EMS regularly, e.g. every 10 seconds, and this method can take a lot of storage space. An example of such a type of sampling is related to equipment statistics.

Time Value Store?
10.00 100 Yes
10.10 100 Yes
10.20 100 Yes
10.30 101 Yes

 

Sample by Event

Like continuous sampling, only changed values are sampled at each interval, consequently reducing the storage cost. In event sampling, all process values are checked regularly, but only sampled if values are changed from the last stored value. It is the most frequently used method for inserting data into a historian database, thus reducing the database size because no unnecessary storage of duplicate values is stored as shown below in the table.

Time Value Store?
10.00 100 Yes
10.01 100 No
10.02 103 Yes
10.03 103 No

 

External by Event

This method requires already time-stamped values from the external equipment i.e. RTU. In this mode, rapid state changes may be captured on a millisecond level. The external sampling can update the UDW with old points of time if the RTU supports the buffering of data, the historical database can therefore be updated afterward.

The RTU collects the following time series data directly transferred to the UDW server.

  • The indications, measurements, and accumulators are supported
  • The timestamp from the RTU is used
  • Millisecond accuracy in the timestamp, if available
  • Support for milliseconds in trends and reports
  • Local buffering in RTU if the communication is down
  • Optionally, the time when the data reaches the SCADA system (called ‘store time’) can be stored in an additional column

 

Analysis and Data Mining

When receiving and examining data from the UDW database, the objective is to find time series data using specific criteria. e.g. getting the maximum loads on the transmission lines for a number of days with a temperature below a certain value. Only the end user knows precisely what sort of data is stored in the database. The customized query statement may be created to fetch the desired data in several steps. As already stated, the data objects have a lifetime and may have been modified. Hence, a query statement will also need to consider the different versions that may exist for objects.

 

Summary

As discussed above, a data warehouse is a more purpose-specific, specialized database system designed to store and manage data, related to the monitoring, control, and operation of power system utilities such as generation, transmission, and distribution.

The following article will cover the archiving and restore method, calculations on stored data, the post-disturbance review (PDR) function, and the user interface for data viewing.