An In-Depth Guide to Building a Streaming Analytics Platform

Analytics Platform
Facebook
Twitter
LinkedIn

Table of Contents

I often talk to manufacturers who are looking to build a capability to drive business value with real-time production data. They may have had success with small pilots, in which they used operational data to tackle a specific initiative such as predictive maintenance.

Their instincts are right: moving beyond retrospective, a batch-based analytics platform is the path to unlocking significant business benefits. Still, pilot applications typically focus on one-off use cases, without a clear path on how to operationalize data streaming in from all production assets. This makes it very difficult to extend an early pilot success to an approach that scales onto new processes across the organization.

So how do you achieve scale with streaming analytics in manufacturing? I’ve spent the past five years leading a team in building a platform to acquire, integrate, model, and analyze streaming production data to address manufacturing operations. Over the course of that tenure, I’ve seen some successes and just about every challenge you might imagine… and some you might not. Most of these only came to light as we moved to scale our platform across multiple processes and use cases with some of the largest global manufacturers.

As to the lessons I’ve learned: In a nutshell, what may at first seem like a fairly typical compartmentalized IT development project, is more accurately a multi-year undertaking, involving a radical change in the way companies use real-time information to support their core business. Let’s unpack some of the major technical hurdles posed by each phase of manufacturing analytics: data acquisition, modeling/contextualization, and analysis/visualization.

Phase 1. Data Acquisition

This first stage involves acquiring data from equipment and systems involved in production. When it comes to gathering data, IT professionals are well acquainted with the need for protocol translation: creating a common language or format for different data types, which are especially abundant in manufacturing. Similarly, many IT organizations have focused on aggregating manufacturing data in a data lake or cloud repository. That being said, let’s examine a few manufacturing-specific issues that can challenge even the savviest IT teams:

Reconciling and synchronizing different data rhythms

Acquiring streaming data is foundational to building a real-time analytics platform. “Data rhythm” — the frequency with which input arrives — varies greatly. One sensor may send 10 readings a second. Another might only send information at the beginning or the end of each shift. An analytics system must therefore have the capability to persist these various rhythms. Additionally, the clocks from various systems can be offset from each other or aligned to different geographical timezones. Given that a typical manufacturing process generates hundreds of such streaming inputs, the complexity of orchestrating them all can be quite staggering.

Configuring, monitoring, and updating data-capture systems

Another challenge is setting up, monitoring, and managing all the systems or devices capturing information from production machines. What if a device goes down, or stops delivering data, or loses its internet connection, or its disk becomes full? Suppose a configuration needs updating across your entire production footprint? Automated functions are required to track and manage all these possibilities. Trust us: these capabilities take time and expertise to build, and add no small measure of cost to the overall development budget.

Phase 2. Data Modeling

In this phase, data models are created to address a specific problem or use case. Blending data from various sources is a standard aspect of modeling. Other considerations are equally important but perhaps less well known. These include:

Stream blending and transforming

Blending and processing data in real-time is one of the most important and challenging aspects of streaming analytics. Unlike batch-oriented POCs, if data is arriving continuously, you must decide when to output a transformed record. Think of this as a table continuously adding rows: you routinely need to select and process a specific set of them. As mentioned earlier, data streams in from numerous sensors at different times. Input can arrive out of chronological order. That is, with systems blending data together to build a model of a process or machine, data acquired relevant to a given time window may come in much later. Moreover, some inputs arrive once a day and need to be appended onto subsequent record. Your modeling software must have the intelligence to reconcile all these variables to accurately represent the production process. It will also need to correct any incongruence for late or sparse data as it outputs records in real-time. This foundational aspect of stream processing is often overlooked when trying to scale from offline or batch POC initiatives.

Adding new data to historical models

Let’s say a process has been modeled using sensor and quality data, to investigate something that’s gone wrong in production. Later on, it’s determined that information about raw materials collected by a different system (for example, your ERP system) is also relevant and needs to be incorporated. The platform must support the capability to blend historical data from these new sources with existing models, without disturbing correlations or breaking the existing machine-learning algorithms.

Solving other use cases

Most pilot projects batch-model information relevant to a specific machine, process, or issue. This imposes a severe limitation: the data set, and the algorithms built on it, aren’t readily scalable to other assets, machines, or problems. To enable that, the system needs to be architected with standard, flexible data models that can be extended to apply to any number of future use cases. Once again, these are very complex capabilities to build and incorporate.

Designing data models to interoperate

Suppose your team has built an application that models data for a particular process, and then builds a second application to analyze what happens to parts as they travel through production. The real value of manufacturing analytics is only achieved when you can connect data models. This enables such insights as understanding final part efficacy as it relates to all the machines that worked on it. Your data and modeling environment must therefore be designed to let you investigate cross-process and part/batch relationships between and among models. In an attempt to scale real-time analysis, large organizations will often add resources to develop many applications in parallel. However, in the absence of a common holistic data modeling framework, the outcome is disjointed applications that don’t interoperate. As complex as scalable, domain-specific data models can be to architect, they are nevertheless fundamental to addressing most manufacturers’ long-term analysis needs.

Extending and changing models

What if another plant wants to use the analysis system, but their line has several machines that are different from yours? Understandably, they’d like the input from these components added to the original model. Similarly, it’s inevitable that at some point the initial plant will add or replace one or more machines, and engineers will want the analytics platform model modified accordingly. The solution for both situations is a “self-service” toolset that lets plant or IT engineers update, scale, and expand a given analysis project on their own, on the fly. This capability must also be built into the analytics engine. If not, any and every project change or expansion becomes yet another assignment for the IT development team.  

Phase 3. Data Analysis & Visualization

This stage involves building out front-end visualization and a workflow. These aspects are probably already on IT’s radar. Here are some that may not be:

Integration with your existing systems

Upstream integration to acquire data from relevant systems is often the primary focus for an analytics POC, but when operationalized, many analytics will have downstream implications. For example, when the analysis generates an alert, it’s a best practice to route that to a work-order system. Compliance systems likewise need special alerts and notifications. Highly automated integration with these and other platforms is critical for ensuring the appropriate parties have access to the data they need, when they need it. As with traditional web systems that we take for granted, at the completion of a digital transformation effort, every data model of machines, parts, or batches should have standard APIs that enable IT teams to easily integrate with other systems.

Deploying new algorithms

Business requirements often change faster than software does. No sooner is an application deployed, then process engineers and quality teams show up with requests for new or modified analytics. Bottom line, your platform needs the capability for plant operational/IT teams to create and incorporate new algorithms at a rapid pace.

The Big Picture

All these challenges emerge when the true nature of the mission is understood: building not just a problem-specific application or pilot, but an end-to-end analytics platform that can scale across all your plants, and be readily extended to grow with your business. True digital transformation is not an ongoing series of one-off batch pilots. It is a multi-year endeavor to incorporate real-time information into all aspects of your operations, including production.

It’s hard to overstate the potential of digital analytics to improve manufacturing productivity, quality, and ultimately, profitability. At the same time, this is new technological territory for even the most sophisticated and experienced IT teams. It is therefore critical to carefully assess which elements are advisable to create in-house, what you can buy from a technology provider, and the kinds of outside services to engage.

A key question to consider: Are your business needs best served by investing the enormous capital resources, manpower, and development time required to build a scalable analytics platform yourself? Or as dozens of leading manufacturers have discovered, is it much more efficient to leverage the experience of the market, and go with a proven platform that delivers value today?

This proven platform is what Sight Machine offers. To see how it addresses the requirements, complexities, and nuances of real-time manufacturing analytics, visit our platform overview page.

Learn More About The Sight Machine Platform Sight Machine enables companies to gain real-time visibility and actionable insights for every part, machine, line, and plant throughout a manufacturing enterprise. Our analytics platform enables manufacturers to use all of their data—no matter where or in what format it’s created. Platform Overview

Ryan Smith

Ryan Smith

VP of Product and Engineering at Sight Machine Ryan has expertise in manufacturing, life sciences, and automation hardware and software. He has developed and implemented robotic inspection systems and real-time surgical navigation software at leading life sciences companies.

Curious about how we can help? 
Schedule a chat about your data and transformation needs.