Historian

In the previous sections, we saw how data acquisition in SCADA systems starts from the controllers (the PLCs or the DCSs) or RTUs gathering measurements from sensors and equipment through industrial protocols. The digital representations of these measures are usually called tags or datapoints. Each of these represents a single input or output signal that is monitored or controlled by the system and usually appear as value-timestamp pairs. After generation, the data can also be sent to other monitoring servers for analysis by humans or to the MES system for planning and maintenance. At the same time, data often feeds a specialized database for storing and managing times series. This specialized database is called Historian, or data Historian. Historians are not relational or NoSQL databases; they have fewer capabilities and features and a much simpler structure. However, they are designed and optimized for managing situations related to times series. They can acquire data from the controllers, ingest a huge amount of data, optimize storage, and retrieve data quickly.

Historians developed and evolved following the growth of automation in industrial processes. Since production processes have become more automated, controllers have become more powerful, industrial protocols have become more standardized, and data and information have become more available, there is a greater need to collect, store, manage, process, and retrieve industrial signals in industrial plants. In a typical small company, there may be one Historian, while larger companies are likely to have a Historian in each plant of the company and another one at the company headquarters to collect the data that comes from all of the plants.

Up until a few years ago, the cloud was not a real option for databases such as this. As Historians acquire data from controllers and require a high throughput in both input and output, they need to be close to their data sources, and so they generally lived on premises. Recently, however, a generation of cloud-native time series databases (TSDBs) have been developed to store time series in the cloud. Over the next few years, they may well replace on premise Historians, or, more likely, be strictly integrated with them. We may well have a scenario in which the Historian on premises acts as a data collector and temporary storage, and pushes all data to a cloud-native TSDB into the cloud. In the following few paragraphs, we will just refer to on premises Historians.

Historians need to gather data from controllers. They implement software interfaces for the most common industrial protocols and fieldbuses, including Profibus, Modbus, DeviceNet, and OPC, and sometimes also legacy protocols for old plants or DCS. Connectivity to industrial devices is one of their most important features. Some Historians implement just a tenth of the industrial network protocols, while others, such as the OSI-PI, implement much more. These interfaces implement the most efficient way to gather the data according to the capabilities provided by the specific protocol. Data acquisition by polling is always supported, as is the dead band mechanism. Unsolicited data collection depends on the capabilities provided by the underlying protocol. In addition, the insertion of values that don't come directly from the controllers but are instead calculated or derived by them can be carried out through specific interfaces or APIs.

Historians provide fast insertion rates: up to tens of thousands of tags are processed per second. These performances are made possible by a specific buffer area which keeps recent values in memory, sorted by increasing timestamps, to later write the data on the disk. Historians also provide a functionality called store and forward, which prevents data loss during planned or unplanned network outages between the data sources and the Historian server. To store large amounts of data with minimum disk usage and acceptable approximation errors, Historians often rely on efficient data filtering or lossless compression engines. The data filtering mechanism ignores the incremental changes in values that fall within a deadband that is centered around the last reported value. It only takes into account new values that fall outside the deadband and then centers the deadband around the new value. Basically, it identifies and discharges repeating elements.

The data compression mechanism is applied to all values that pass through the data filtering mechanism. Its implementation depends on the specific Historian product, but the basic idea behind it is to identify and discharge the redundant data to avoid losing content from an information perspective. The data compression algorithms hold the last one or two values in memory, building up a sort of swinging door and storing or discarding the values based on the next value acquired. If the swinging that's achieved by replacing the hold value with the last acquired value allows us to avoid losing pieces of information, the held values are discarded. The loss of information related to the discharge of the hold values depends on the specific Historian product, and it is a trade-off between the storage allocation and the amount of information coming from the discharge. It can be tuned by the user through specific settings. The values that are physically stored at the end are typically called raw data.

Historians are fundamental in technical information systems, providing data for plant-operating applications and business intelligence through time series-specific features such as interpolation, re-sampling, or retrieving pre-computed values from raw data. Interpolation is a very important feature that differentiates Historians from other types of databases. The raw data is stored in the Historians according to the data collection interval and depending on their variations. Tags that have the same data collection interval have values that are stored in different timestamps. The same happens for tags that have their data collection configured as unsolicited. This means that the signals that are related to the same physical asset, for instance, a tank have their raw data stored with different timestamps, meaning that if we need to get a snapshot of the status of that asset at a specific time, most of its signals do not have raw data for that specific time. The value is provided by Historian through interpolation. This involves different algorithms rebuilding the missing data using the raw data and also providing an indication of how accurate the interpolation is. Values that are not measured directly, including key performance indicators or diagnostics, may be computed automatically according to the input flow and then stored in the Historian archives.

Some Historians store time series following a hierarchical data model, which reflects the operating environment. This data model refers to the asset model and is consistent throughout the plant's organization to ease browsing and so that you can group similar time series by subsystem. Sometimes, the asset model is a separate module within the Historians application and other times, it is an external application or module that is part of the MES system to which the tags are linked. In either case, the asset model is a fundamental piece of the industrial data architecture, as we will see in the section on MES later.

Visualization features are provided by standalone services or web applications that are supplied with Historians. They facilitate a sophisticated exploration of the archived data by displaying plots, tables, and time series trends that retrieve only the representative inflection points for a considered time range, statistics, or other synoptics. Historians also provide a proprietary SQL interface, a sort of SQL dialect that is used with legacy extensions to support their specific features.

Roughly speaking, Historians are characterized as follows:

A simple schema structure, based on tags with no support for the transactions
An SQL interface to query the archived data, offering ad hoc features for time series
Connectivity to the industrial protocols to gather data from the controllers
Data filtering and compression mechanisms to optimize bandwidth and storage consumption
A store and forward mechanism, which makes them resilient against the instability of network connectivity
The ability to manage a high volume of append-only data
A replica mechanism, often called Historian-to-Historian, for pushing streaming data between two instances of the same Historian product
Data row interpolation and built-in capabilities for managing industrial data
Visualization capabilities and asset model support

From an architecture perspective, Historians can be split into two main categories:

Historians that use a legacy architecture for storing the data. In this category, we have OSI-PI (OSIsoft), Proficy Historian (GE), and InfoPlus21 (AspenTech).
Historians that use a third party such as Microsoft SQL Server for storing the data. In this category, we have Wonderware Historian (Schneider), FactoryTalk Historian (Rockwell Automation), PHD (Honeywell), and SIMATIC IT Historian (Siemens).

Very often, the companies who develop controllers offer a product suite where each product covers a specific area of the plant's automation system, while at the same time providing an integrated and unified interface to the users. This means that, in the same suite, there are tools to develop, test, and download the control program, a SCADA system, a Historian application, a tool for designing and maintaining the asset model, and other tools and applications that make up the MES system.

Some of these include the following:

FactoryTalk (Rockwell Automation)
SIMATIC IT (Siemens)
Proficy (GE)
aspenONE (Aspentech)
Wonderware (Schneider)
PI System (OSIsoft)

Table of Contents for Historian

Create new playlist

Sign In

Sign Up

Table of Contents for
Historian