Let’s talk about data baby, let’s talk about O and T

Let’s talk about data baby, let’s talk about O and T

l saw a post on linked, where the poster (postee?) was advocating for modern PLCs to run SQLite to provide a historical data store. Whether you agree with this or not doing so would create yet more siloed data within an organisation. It’s likely that some organisations will have many PLCs, spread out across many locations so you can quickly see how these silos multiply and get out of control. In 2022, organisations must be looking at how they connect these siloed datasets together to drive operational and organisational insights. Even better don’t create a silo in the first place.

The various major clouds have been around for well over a decade now so data lake and data warehousing patterns have been well established. No one needs to reinvent the wheel!

In keeping with the idea of the newsletter, let’s take a look at a simple data platform architecture that could be used to pull data from OT systems to the cloud and report on it in a simplistic way, that’s shareable across the organisation. This would be a minimum viable product but with a data engineer and a platform engineer, in a relatively short period of time, an organisation could be up and running and have some foundations to build upon once senior management has seen the value that the MVP can drive.

The above diagram illustrates how this can be engineered. There is an optional gateway that can be used to isolate the OT network from the public network, which would give your OT network additional protections and greater control over the data that’s surfaced but is likely to be the single greatest hardware cost.

I’ve kept the edge compute generic, various vendors provide software that can run on the edge compute and interface with the OT network devices and push the data into the cloud. At Intelligent Industries, we typically run Azure IoT Edge and our own bespoke modules that can be configured as per the client's requirements.

Bridging the cloud and the edge is Azure IoT hub, which you can use for IoT device management and secure communications for sending and receiving data.

At this point, you could simply configure IoT hub to export the telemetry data to an Azure storage account, or configure Azure Data Explorer, collecting the IoT data for later processing and analysis but that’s not driving any organisational value so let’s look at the next simple steps.

The OT telemetry data flows to the IoT Hub but you want a hot path with real-time processing of the data that can be used for some simple reporting. You can use Azure Stream Analytics (one of the stream processing technologies available in Azure) to push the telemetry data straight to Power Bi for display on a real-time dashboard. You can additionally push to data stores such as Synapse (data warehouse) or Cosmos DB (global enterprise scale NoSQL). Finally, you can use stream analytics for running machine learning models to do things such as real-time anomaly detection, outputting to Azure Functions that can create alerts and notifications for viewing in a web portal or via email or SMS.

Now the telemetry is in Power Bi you can create a dashboard that can be shared with other employees within the organisation, showing the real-time status of production, etc.

In the next newsletter, we’ll look at how we can also bring other systems data, such as ERP and MRP into the data platform so we can start to get that bigger-picture view.

Simple dictionary of terms

Data Lake:  Usually a single store of data including raw copies of source system data or sensor data.

Data warehouse: Advanced database used for analytics and reporting.

ERP: Enterprise Resource Planning software.

MRP: Manufacturing Resource Planning software.

MVP: Minimum Viable Product. The minimum requirements for a product or solution.

OT: Operational technology. Hardware and software within the industrial network, controlling or monitoring assets.

PLC: Programmable Logic Controller. A rugged computer used within industrial environments typically using simple ladder logic to control sensors and drives.

Stream processing: Processes data as a sequence of events in time.

Telemetry: The data from remote sensors and systems, typically in real-time.