Assessing Your Data’s Readiness
Being a data-driven manufacturer is all the rage. We engage daily with global manufacturers looking to better use production data to predict machine failure, optimize processes, or increase output.
Yet many of these manufacturing analytics projects stumble out of the gate. Here’s why.
Most manufacturers don’t understand the condition of their data. The industry is challenged with a heterogeneous data landscape where data varies by sources, formats, structure and storage location. The ability to join, blend and integrate multiple sources of manufacturing data into digital twins of production processes, lines, plants, and parts is a foundational requirement for being a data-driven manufacturer. Unfortunately, the quality of a manufacturer’s data and its readiness to support analysis is often only evaluated and understood after a project has begun.
At Sight Machine, we often begin a project by assessing a manufacturer’s data readiness to understand the level of effort required to render data useful for a particular use case, model, and analysis.
We’ve found that performing a comprehensive data audit is the most crucial step for ensuring a project is successful. In fact, after this audit, we typically end up coaching manufacturers on rescoping project objectives, expectations, and timelines.
Data Accessibility Does Not Equal Data Readiness
Having data accessible, such as stored in the cloud, doesn’t imply data readiness. These days, we are seeing more and more manufacturers collecting data and pushing it into data lakes. There is a common misconception among manufacturers that having centrally stored data will allow them to execute quickly. However, although moving data to a data lake increases its accessibility, it does not necessarily ensure the data can integrate with other sources. It’s important to assess data in centralized data lakes prior to beginning digital manufacturing initiatives.
In a previous blog, we provided an overview of the concepts of data readiness and fitness. Although data fitness can impact the scope of projects, we find that most manufacturers struggle with qualifying the readiness of data for use in manufacturing analytics applications.
At Sight Machine, we’ve developed a process for auditing production data to better understand its ability to deliver business impact. The Sight Machine Data Readiness Audit examines the attributes we’ve found to be most critical for integrating production data in real-time with other sources.
An audit tells manufacturers several things:
- Will the data enable real-time integration with other sources for analysis?
- What will it take to make the data suitable for integration? How will project budgets and timelines need to be adjusted?
- What changes to their data capture strategies will enhance readiness?
Data that passes the audit denotes that it can be easily integrated with streaming production data from other sources and used in analysis. If a data source fails the audit, it generally means some level of data wrangling is required, or in some cases, the data is unsuitable for integration.
By providing a consistent methodology to assess data, manufacturers can establish a common vocabulary for describing the usability of data.
Auditing Data Readiness Attributes
There are five attributes that determine if a data source can be integrated with other real-time production data to create useful models to support decision making:
Can the data be accessed in a repeatable, automatic method? A typical challenge with accessibility involves data housed in sources that are not on a network. Another common challenge is that the data cannot be acquired in real-time because extraction requires manual processes. Remediating these challenges can significantly impact project timelines, so determining the priority and ROI of integration of these marginally ready data sources is vital
2. Format/ Protocol/ Schema:
Is the data is in a consistent, well structured, and readable format required for ingestion into analytic applications? We often run into production data that is in complicated, undocumented SQL schemas or transmitted via unique protocols that require reverse engineering or custom data transformations. If the data is in proprietary, but well-documented formats, data engineering teams will need to investigate the schema to enable translation. Data in undocumented, proprietary data formats often requires significant resourcing to research and build the logic necessary to make it useable.
3. Asset Tagging:
Can the data be tied back to physical sources involved in production? We often find that data is partially labeled or is labeled manually. Data sources that are managed this way will require the development of applications to assign the physical sources in an automated fashion. If the physical source of the data is not available or captured, the manufacturer will need to make infrastructure investments, such as installation of new data collection systems.
4. Data Relatability:
Is there is a systematic method for joining data sources such as production timestamps and/or product serialization? In many cases, timestamps can only be inferred or the data contains partial serial number information. This requires the manufacturer to develop the business logic to define relationships between the data and other sources. If no consistency in timestamps or serialization exists, the manufacturer will need to add serialization equipment or add system clock synchronization across equipment to establish and ensure the integrity of these relationships.
5. Institutional Knowledge of Data:
Are the resources available for mapping data to processes and physical sources? Institutional knowledge of the data is critical for building the data relationships and models that enable real-time analysis. Unfortunately, this is one of the most difficult attributes to objectively measure since the availability of resources can change during the course of the project. If personnel or documentation are not available, the manufacturer will need to perform data forensics to examine the data and build this knowledge base.
To support real-time manufacturing analytics capabilities, a data source must be ready in all five data attributes. By understanding the readiness for each source, a manufacturer will be able to properly scope project timelines and scope.
Using a Self-Audit to Accelerate Execution
By going through the audit process, manufacturers gain a better understanding of their current data landscape. It also defines the requirements for using production data for real-time analysis. Performing an audit is the first step in building a data strategy.
Also, by simply understanding the characteristics that describe ‘ready’ data, manufacturing IT and operations teams can begin to implement simple changes in how data is captured and stored to improve future usability.