Automated Data Labeling (ADL) Uses Machine Learning Software and GPU Hardware to Automate High-Volume Data Preparation
Working with its partners NVIDIA and Microsoft, Sight Machine has achieved massive acceleration in preparing manufacturing data for analysis, enabling manufacturers to incorporate orders of magnitude more plant data into data foundations that capture production in digital form. These robust data foundations empower broad-reaching and impactful digital transformation initiatives.
Sight Machine’s Automated Data Labeling (ADL) software links Sight Machine’s manufacturing data platform with NVIDIA machine learning software running on NVIDIA GPU hardware, all optimized on Microsoft’s Azure infrastructure.
Sight Machine ADL makes it logistically feasible for companies with vast quantities of poorly understood production data to very quickly build a common data foundation representing their manufacturing operations.
In late 2021, Sight Machine and NVIDIA announced they were working together to accelerate manufacturing data processing. The fruits of that collaboration are now available in beta form, and the results are nothing short of astounding. Sight Machine is also a member of the newly created NVIDIA AI Accelerated program, which will help manufacturers rapidly deploy and innovate with AI-enabled solutions that address their greatest challenges.
Until now, even the most successful digital transformation projects in manufacturing have been able to focus on only a small fraction of the torrent of production data. Sight Machine ADL makes it logistically feasible for companies with vast quantities of poorly understood production data to very quickly build a common data foundation representing their manufacturing operations. This robust data foundation, using real-time streaming data, enables continuous improvements in all the key metrics of production, including throughput, quality, and sustainability.
In partnership with NVIDIA and Microsoft, Sight Machine is working with a large agricultural processor and a large chemicals manufacturer, each of which has tens of thousands of point sources of data (often called tags) that need to be labeled, or tagged, in order to use it to improve production. Doing this labeling manually would require months of work by a team of data engineers, process experts, and other highly trained staff if such staff were available to perform this grinding routine. In practice, most companies doing manufacturing data analytics focus on a few dozen or a few hundred tags they believe are the most relevant. This leaves a large untapped opportunity in their remaining data.
See Sight Machine’s Chief AI Officer and Co-Founder Kurt DeMaagd’s Presentation on ADL at NVIDIA’s AI developer conference GTC
Using Machine Learning to Prepare Data for Further AI Analysis
With ADL, Sight Machine and NVIDIA are bridging a gaping chasm between the popular understanding of what artificial intelligence can accomplish and how it really works. General-purpose big data and artificial intelligence technologies are typically unable to make sense of vast quantities of heterogeneous, unstructured data on their own. It is first necessary to understand where the data comes from and what it represents.
“Where things kind of fell off the rails in the initial big data revolution was this overemphasis on magic algorithms, where you could just take all your data regardless of quality, stuff it into that algorithm and supposedly actionable insights are going to come out,” says Kurt DeMaagd, Sight Machine Chief AI Officer and Co-Founder. “But frankly, this turned into a case study in the garbage-in-garbage-out problem, where the software created lots of false starts, where the analysis was difficult to understand, filled with spurious correlations, and end-users didn’t trust the data.”
A modern factory may generate streams of data from tens of thousands of sources, such as individual sensors, and large enterprises produce millions of streams. Many companies have been collecting industrial data for years in historians, data lakes, and data warehouses, assuming that once they get data aggregated they will be able to derive value from it. But although these large data pools improve data accessibility, the data has been removed from its context, often making it difficult to know what it represents, translate analysis into actionable insight, or build organizational trust in the analysis.
For example, consider an actual tag name we have encountered, “268-IE-A-2ND-FFA-LC.” Although this cryptic name is the result of a rigorous naming convention from the original machine builder, it is not intuitively recognizable. With moderate effort, an engineer may be able to find the definition in the machine’s specification. But manually doing this for tens of thousands of tags would require far too much time and manpower to make it feasible.
ADL automates the labeling of all data points, taking Sight Machine’s decade of expertise in digitizing manufacturing and combining it with NVIDIA’s AI platform and expertise in deep learning, and with Microsoft’s end-to-end streaming and AI solutions for manufacturing.
ADL automates the labeling of all data points, taking Sight Machine’s decade of expertise in digitizing manufacturing and combining it with NVIDIA’s AI platform and expertise in deep learning, and with Microsoft’s end-to-end streaming and AI solutions for manufacturing. NVIDIA GPUs – graphics processing units, a type of silicon processor originally developed for high-speed graphics – and its GPU-optimized AI/ML libraries are ideal for easily crunching large volumes of data efficiently. ADL relies on AI techniques such as natural language processing, ensemble methods, cluster analysis, PCA, and other data payload analysis.
“Because manufacturers haven’t known how to organize this data in the past, they haven’t really been able to use it, so they only work with a small subset of their data,” DeMaagd says. “This is a way of taking that much larger set of data, unlocking it, and then making it available for analysis.”
Many companies have now exhausted the value of their smaller data sets. By giving them the ability to manage comprehensive data sets, Sight Machine ADL enables them to identify the variables (machine settings, timings, and other levers) that are most linked to improved quality, faster throughput, and reduced resource usage.
ADL benefits both from the ability of AI to learn how to identify the sources of data, and the powerful acceleration in processing speed offered by GPU hardware. On standard CPUs (central processing units, like typical Intel server processors), running an analysis of this size takes many hours, whereas, with Sight Machine, NVIDIA software, and NVIDIA GPU hardware in the Microsoft Azure cloud, the same analysis might take 30 seconds.
Improving Accuracy and Accelerating Sight Machine Customer Success
ADL also promises to speed up Sight Machine’s onboarding of new customers, enabling Sight Machine to incorporate roughly 1,000 times more data into its analysis of manufacturing performance.
“It should be a big unlock for Sight Machine internally because a lot of the time that we currently spend on tag-to-asset mapping will be automated,” DeMaagd says. “It’s a good accelerator for our project implementation.”
In its initial form, ADL looked at tag names only and achieved an accuracy rate of over 80% in correctly assigning tags to the process area that generated them. The second stage of analysis, now underway, adds data introspection, which DeMaagd expects will increase accuracy to over 90%.
“By switching over to NVIDIA GPU hardware, we’re seeing somewhere around a 100x speed-up in processing time…and at that point, the sky’s the limit for what you do.”
By comparison, a chemicals company had tried to do similar work manually on a smaller set of tags by relying on its subject matter experts. They were able to achieve about 80% accuracy, at a significant cost of staff time over two months. Such a manual process does help unlock the value of their data, but at substantial financial and opportunity costs, and it is a process that wasn’t scalable across their organization.
Running this tag-to-asset mapping on conventional CPU hardware is possible, though slow, up to about 10,000 or 20,000 tags. “But once you break past that number, the compute time starts to get into the hours, and so the only way to make this feasible is to do optimization such as running on the GPU,” DeMaagd says. “By switching over to NVIDIA GPU hardware, we’re seeing somewhere around a 100x speed-up in processing time. This has really enabled this project, where we can try something, run it, and boom, a few minutes later, your analysis comes back immediately instead of 30 minutes later.”
“And at that point,” he says, “the sky’s the limit for what you do.”