To Succeed In Industry 4.0, AI, And Internet Of Things, Forget (For Awhile) The Things
Want to start a bar fight among data types? Use the word “model.” The word means so many things. And the consequences – especially for business leaders – are serious.
A model can mean, among other ideas, a data model, a machine learning model, a schema, a simulation, a template of an object or process, or an integrated set of functions performed on data. A model can be applied to static data (the way 99.9%+ of models are built), or applied to streaming data. All are fundamentally different jobs.
To see how the language problem goes beyond academic debate and causes real problems, let’s consider the plight of a management team at a big industrial company. Charged with pursuing a 4.0 mission, leadership forms a steering committee, develops long lists of use cases with factories, conducts pilots, and hires data scientists who with dedication and skill build very good models. And then about two years later, there is a review and the uncomfortable understanding settles in that there is no real progress.
This happens in almost every large manufacturing company. And it can’t be because everybody they asked for help is inept. To make the big projects succeed, it’s essential to take one giant step back.
What Are We Trying To Do Really?
Why do we talk about modeling industrial data in the first place? A good working definition of a “model” is “a representation of reality.” We model data because we want to see the reality we otherwise can’t know. Going back to our mythical company, our leadership team is just trying to find a way to represent what’s happening (and what will happen) in operations: fast, accurately, and inexpensively. That’s it – all the work they’re doing is basically an effort to answer this question: “tell us the best way to know what’s going on.” More precisely, from a data point of view, tell us the best way to model.
And here around this seemingly clear concept of modeling is where very deep and very bad problems lurk for Industry 4.0.
When modeling industrial data in most fields, usually we start with some sort of template. We design a framework of what we want to represent and we place data into the template. And if we’re ambitious, we get the templates right for a lot of models and then we go on to build very cool “graphs” that link templates. That’s the tried and true way.
But obviously, when applied to Industry, something about this method breaks. The problem is subtle, and it’s not because our tools aren’t good or even that the data in plants is just horrible to work with, which it is. The problem lies in the very idea of templates.
Templating is a top-down approach. We set up a representation of reality at the beginning of our work (at the “top” of the process), and put data into the framework. For example, we build a model (sometimes referred to as a Digital Twin) of a machine, a workflow, or whatever else we want to model. And if successful, when we’ve built a lot of Digital Twins, the approach ends up creating a sort of Great Library of Alexandria: all the knowledge of our Industrial systems stored in a deep catalog of models.
This approach works in finance, sales, and marketing and certainly for massive websites like Amazon, Google and Netflix because in those worlds the data is clean, structured, and consistent, and the number of templates is ultimately not overwhelmingly large.
The approach also works well with several classes of problems in Industry: (1) simulations and physics-based models of large, critical assets, (2) empirical models of enterprise-level workflows (for example leaning out massive supply chains), and (3) empirically analyzing complex, repeating assets, like trains and jet engines. Figuring out the templates and getting data into each template takes months per model, but because there is not a large number of templates to manage, the template library gets the job done.
Why Templates Fail For Factories
But then there’s this big part of Industry 4.0 that everyone wants to model also, way over on shop floors where there are trillions of dollars of value to be unlocked, and progress has been difficult. The AI and the tools people use there are just as good as anywhere else. But without a very fundamental change in the approach to modeling industrial data, AI doesn’t scale – or else it would have already – and neither does most of the rest of the Industry 4.0 data stack that everyone wants to use. The difference is that manufacturing unavoidably has thousands of kinds of machines to model. Our template approach is about to hit the wall.
To illustrate, let’s go back to the real world and think about what it takes to make a single product, like a box of Raisin Bran. How many hundreds of machines across how many plants are needed to make oil and preservatives, process wheat, make paper from trees, then cardboard from paper, and then package it all up at high speed? Even when we wrestle very bad, difficult data by hand into templates and start building models as we do everywhere else, what we end up building is a bespoke library of hundreds of templates, just for a box of Raisin Bran. Now let’s model Cocoa Puffs, or go further afield to something like car production: 5,000 plants in the U.S., 30,000 parts per car, and who knows how many different kinds of machines to make a single model. We haven’t even tried to find relationships between machines yet, or process and quality, or look at anything besides Raisin Bran and cars (pharmaceuticals, chemicals, steel, and hundreds of other industries are all out there waiting with a lot of data too), and the library just keeps exploding. Yuck. We know one global firm that started building models of every kind of machine they could find and eventually stopped at 5,000. The approach, despite years of effort by skilled teams, ultimately did not succeed.
Other technologists are more circumspect. They say “Here are some great tools. We’ll sell you these tools, you hire consultants, you know your business, you build the models.” This sounds reasonable, but it still just shifts the problem downstream to the manufacturer, who wakes up two years later with a few hundred very talented, moderately annoyed data scientists, many hundreds of templates that can’t talk to each other, and one highly frustrated leadership team. We’re back to 5,000 templates.
Reverse The Approach
What to do? Many technologists are now seeking the holy grail to modeling industrial data, often referred to as a Common Data Model: at its extreme, it’s a single way to represent with one universal model the Matrix of data from all machines, sensors, and all manufacturing. That method does exist and it’s powerful and counter-intuitive, but it’s not a common template of a machine or any other template we would expect. As is so often the case, nature provides a parallel. If by itself top-down templating doesn’t work, go with bottom-up.
As an analogy, consider the endless variation among billions of human beings. We are each truly unique and special, and we all come into the world, develop, and live based on biological instructions that are created within us by four simple nucleotides: nitrogen-based building blocks of life. Think back to your high school biology class. Your unique and endlessly complex DNA is built from just four chemical bases: Adenine, Cytosine, Guanine, and Thymine. These bases form long chains and set up coded instructions across 23 very rich pairs of chromosomes, which together in turn contain the rules for billions of things going on in your body over the course of your life and across billions of unique beings. And although there’s a huge amount we don’t yet understand about DNA, in medicine we do increasingly go from the bottom up. To understand the “template” for the model that we ultimately care most about – the human being – we look at the bases.
Modeling complex data is similar: to really work with it, we have to go low to a level that’s kind of like bases, and not just stick with the templates. We have to identify fundamental and common units of analysis that can build back up to the templates and be associated with them, and then we have to put it all together. And if we get the bases right, just as in nature, we can represent endlessly complex systems.
At Sight Machine, no matter what the machine, process, or industry may be, we transform data continuously into four basic building blocks of manufacturing. This enables massive scale and depth.
Just as with DNA bases, the blocks are standardized ways of capturing and transforming data from any factory data source into almost elemental units of information (1) a representation of each unit of work, or each cycle, by a machine (continuous and discrete), (2) downtime, (3) work done to materials and parts from the point of view of each part as it travels through production, and (4) defects.
These building blocks combine into representations of machines, digital threads, lines, enterprises and supply chains, but since the building blocks are common and consistently structured, the template problem is gone. Building blocks are all rolled up, associated, compared, and analyzed with every mathematical technique from addition to AI. To map data into the blocks, and to analyze the blocks accurately, there’s a lot of powerful stuff going on, but as a structural alternative to templates, the blocks themselves are simple.
With this approach, a single consistent and accurate data foundation is built, with blocks generated continuously as data flows from plants through Sight Machine’s Data Pipeline. It’s an approach that is proliferating across other complex data environments, like healthcare, and it is the future of complex data engineering and data science. In general, it’s fair to say that in the data realm, good products use both templates and building blocks, and in that way scale across enormous complexity. If you’re interested to learn more about how our building blocks work, download this paper.
And in the meantime, if you’re up for an entertaining exercise in conversations about data, the next time someone references a model, ask them what they mean and urge them to share it in short words. Keep track over time, and you’ll be amazed.