Assessing Your Data’s Readiness
We engage daily with global manufacturers looking to better use production and related data to predict machine failure, optimize processes, increase output or reduce energy consumption.
A majority of these manufacturing transformation projects stumble out of the gate because of a poor data foundation. Here’s why.
Many manufacturers don’t understand the condition of their data. The industry is challenged with a heterogeneous data landscape where data varies by sources, formats, structure, and storage location. The ability to join, blend and integrate multiple sources of manufacturing data into digital twins of production processes, lines, plants, and parts is a foundational requirement for being a data-driven manufacturer. Unfortunately, the quality of a manufacturer’s data and its readiness to support analysis is often only evaluated and understood after a project has begun.
At Sight Machine, we often begin a project by assessing a manufacturer’s data readiness to understand the level of effort required to render data useful for a particular use case, model, and/or analysis.
We’ve found that performing a comprehensive data assessment is the most crucial step for ensuring a project is successful. In fact, after this assessment, we typically end up coaching manufacturers on rescoping project objectives, expectations, and timelines.
Manufacturing Data Accessibility Does Not Equal Data Readiness
Having data accessible, such as stored in the cloud, doesn’t imply data readiness. These days, we are seeing more and more manufacturers collecting data and pushing it into data lakes. There is a common misconception among manufacturers that having centrally stored data will allow them to execute quickly. However, although moving data to a data lake increases its accessibility, it does not necessarily ensure the data can integrate with other sources. It’s important to assess data in centralized data lakes before beginning digital manufacturing initiatives.
Evaluating Readiness?
In a previous blog, we provided an overview of the concepts of data readiness and fitness. Although data fitness can impact the scope of projects, we find that most manufacturers struggle with qualifying the readiness of data for use in manufacturing analytics applications.
At Sight Machine, we’ve developed a process for auditing production data to better understand its ability to deliver business impact. The Sight Machine Data Readiness Assessment examines the attributes we’ve found to be most critical for integrating production data in real-time with other sources.
An assessment tells manufacturers several things:
- Will the data enable real-time integration with other sources for analysis?
- What will it take to make the data suitable for integration?
- How will project budgets and timelines need to be adjusted?
- What changes to their data capture strategies will enhance readiness?
Data that passes the assessment denotes that it can be easily integrated with streaming production data from other sources and used in analysis. If a data source fails the assessment, it generally means some level of data wrangling is required, or in some cases, the data is unsuitable for integration.
By providing a consistent methodology to assess data, manufacturers can establish a common vocabulary for describing the usability of data.
The Attributes that Determine Data Readiness
There are six attributes that determine if a data source can be integrated with other real-time production data to create useful models to support decision-making:
1. Accessibility:
Can the data be accessed in a repeatable, automatic method? A typical challenge with accessibility involves data housed in sources that are not on a network. Another common challenge is that the data cannot be acquired in real time because extraction requires manual processes. Remediating these challenges can significantly improve project timelines, so determining the priority and ROI of integration of these marginally ready data sources is vital.
2. Format/ Protocol/ Schema:
Is the data in a consistent, well-structured, and readable format required for ingestion into analytic applications? We often run into production data that is in complicated, undocumented SQL schemas or transmitted via unique protocols that require reverse engineering or custom data transformations. If the data is in proprietary, but well-documented formats, data engineering teams will need to investigate the schema to enable translation. Data in undocumented, proprietary data formats often require significant resourcing to research and build the logic necessary to make it usable.
3. Asset Tagging:
Can the data be tied back to physical sources involved in the production? We often find that data is partially labeled or is labeled manually. Data sources that are managed this way will require the development of applications to assign the physical sources in an automated fashion. If the physical source of the data is not available or captured, the manufacturer will need to make infrastructure investments, such as the installation of new data collection systems.
4. Data Relatability:
Is there a systematic method for joining data sources such as production timestamps and/or product serialization? In many cases, timestamps can only be inferred or the data contains partial serial number information. This requires the manufacturer to develop the business logic to define relationships between the data and other sources. If no consistency in timestamps or serialization exists, the manufacturer will need to add serialization equipment or add system clock synchronization across equipment to establish and ensure the integrity of these relationships.
5. Institutional Knowledge of Data:
Are the resources available for mapping data to processes and physical sources? Institutional knowledge of the data is critical for building the data relationships and models that enable real-time analysis. Unfortunately, this is one of the most difficult attributes to objectively measure since the availability of resources can change during the course of the project. If personnel or documentation are not available, the manufacturer will need to perform data forensics to examine the data and build this knowledge base.
6. Data Security:
In an increasingly hostile cyber-security environment, making sure systems and the data from those systems stays secure is paramount for any organization. IT/Security and Operations teams must work closely together to build the appropriate safeguards to protect these assets, which sometimes run older software or use outdated security methods.
Ideally, your data infrastructure includes centralized management and control of endpoints to ensure the latest software and encryption technology updates are applied, allowing for safe data transfer and seamless reconfiguration and rotation of secrets. Where these types of configurations may not be pragmatic, network isolation policies can offer “best case” security. Knowing the security capabilities and current state of your systems is key to developing a security strategy and roadmap for your enterprise’s valuable data and assets.
To support real-time manufacturing analytics capabilities, a data source must be ready in all six data attributes. By understanding the readiness for each source, a manufacturer will be able to properly scope project timelines and scope.
Using a Self-Assessment to Accelerate Execution
By going through the assessment process, manufacturers gain a better understanding of their current data landscape. It also defines the requirements for using production data for real-time analysis. Performing an assessment is the first step in building a data strategy.
Also, by simply understanding the characteristics that describe ‘ready’ data, manufacturing IT and operations teams can begin to implement simple changes in how data is captured and stored to improve future usability.
You can download a copy of Sight Machine’s Data Readiness Assessment here. For more information on how both manufacturing data readiness and data fitness can impact your projects, read a previous blog here.