Using data science in the plant

Using data science in the plant

The arrival of Data Science and Machine Learning is providing many opportunities to improve performance in the plant. But disappointments are not uncommon.
This article explores obstacles that prevent real advances and suggests ways to address them.

Using data science in the plant: unrealistic hopes and a lack of engagement

One of the most common problems when it comes to technology is the expectation that it will solve problems simply and effortlessly. The term “Artificial Intelligence” obviously increases this expectation. Machine Learning and Data Science are no exception. Despite the expectations they create, they are not miracle solutions. Only structured, business-oriented approaches can deliver concrete results. These new, powerful and high-performance approaches require:

  • relevant and quality data on the subject to be addressed;
  • business skills to develop approaches that give sense to the objective; and
  • skills in Data Science and Machine Learning to handle these tools with precision while observing results objectively.

Engagement is required to ensure that resources and means are made available throughout rollout with a systemic vision for success.

Using data science in the plant: define the objective and expectations

Such a project requires the definition of specific objectives and deliverables. The purpose must be production performance using leverage that will have direct impact on the plant. It is therefore essential to build a project team that involves end users and leaders who drive performance. It is important that the objective mirrors their expectations. A clear objective makes is possible to measure the quality of final results using pertinent metrics.

Building data sets: fastidious and time consuming

The starting point for any modeling exercise is to build a data set adapted to the objective. Building such a data set takes a long time: data scientists and analysts say it can take between 40 and 70% of their time. The complexity of this task is due to:

  • data dispersed in multiple and diverse systems;
  • different data meshes and structures: for example, time series and traceability data;
  • unreliable data quality;
  • insufficient records;
  • etc.

Tools and an appropriate organization must be put in place to collect, centralize and store data over time as part of a suitable business structure. Aggregation of processes relative to the business must also be included.

Developing models: a collective and multidisciplinary task

Developing models and analysis to meet the objective obviously requires both data and skilled data scientists. But that is not enough. Regular interaction with experts in the field to observe progress on the floor is essential to ensure that the solution is aligned with identified needs.

The approach is also optimized by including different perspectives in the company: production, R&D, methods, processes, quality, and continuous improvement. Business knowledge is a valuable addition to information extracted from data, enabling the development of more powerful, relevant, and extensible models.

Implementing models in the field

There is no point in developing the best model if it is never used. But there are several obstacles to rolling out in the field. For this to be effective, the following are necessary:

  • a continuous source of input data to supply the model in real conditions;
  • manage model execution, recover results, and make them available to users in a form compatible with their uses, such as visualization and alerts; and
  • train field teams to use the new information, to interpret it, and to act appropriately.

Maintaining models over time

Establishing the model in the field is just the beginning. A lot of adjustment is required to achieve the desired results. Ensure that the complex reality does not negatively impact user expectations at the outset. At this stage, regular dialog is vital to ensure that the objectives are achieved and results are robust.

In the long term, operational teams need to incorporate this new element in their projects: a new recipe, new product, changes to the production line or processes. It is important to check the impact on models, adjusting them with the teams that designed them if necessary.

The company must develop tools and create the internal structures and organization required to roll out models quickly and maintain them to ensure long-term added value.


  • Stay pragmatic – focus on collaborative and iterative approaches that target the business issue being addressed.
  • From the outset, take the necessary steps to have data sets for model development, as well as data flows to supply model execution in the run phase. This is what our Data Lake Process does.
  • Ensure constant checks between model design and execution in the field to ensure that compliance with business uses is optimized.
  • Adopt clear processes to guide model design and maintenance.

Author: Mathieu Cura