Process influencing factors identification: wich strategy?
Identifying process influencing factors has several benefits:
- understanding the process and factors that determine performance, and
- identifying which parameters to supervise in order to select the best operating ranges for optimized operating conditions.
Comprehensive vs rational approach
These two approaches are often in opposition when identifying influential factors. Should all the data or as much as possible be taken into account, or should the data be passed through an “expert” filter to limit the scope of the study. The two approaches address different priorities:
- The first aims to avoid bias. As a result, expert scrutiny cannot censor the scope of the study with its priorities.
- The second approach aims to improve the quality of information used to maximize algorithm effectiveness.
Both arguments are sound and should not be opposed.
The objective: find a robust explanation for process inconsistency
It is important to go back to the objective to get the right balance. The objective is to identify process influencing factors that are not necessarily understood, but which have been observed and provide relevant information about the process in question. There are, however, some inevitable constants used for mathematical and statistical models in this case. The more factors studied, the higher the risk of:
- detecting coincidental correlations;
- reducing model efficiency in variable selection.
This is what “robust” means. Algorithms are used to root out inconspicuous correlations, but study results must also be as consistent with reality as possible, and regardless of the data set being studied.
Do not skip over pertinent parameters that have not het been observed
There is no point using poor quality data: eg. a damaged sensor or a sensor with serious calibration differences over time. That disrupts rather than improves the process.
This aspect can, however, be controlled over time by applying a data quality approach: such as metrology monitoring or measuring with accuracy and noise levels compatible with the planned use of the data. This extends the scope of the approach.
Parameters often evolve in a coordinated manner due to the nature of the process observed. In this case, the natural correlation of parameters reduces model sensitivity. The estimated influence is likely to be diluted between these different factors. For example, for equipment such as evaporators, a number of measured parameters – temperatures, pressures, etc. – are completely interdependent because of the laws of thermodynamics. Only a few need to be controlled to set the value of the others.
Knowledge about the process being studied, as well as preliminary assessments of these correlations, makes it possible to identify groups of interdependent parameters and to select those that appear the most relevant.
Not all parameters can be controlled. Depending on the desired outcome, it may be possible to remove parameters that are not actionable and that cannot become actionable without changes to process or investment. For example, a fluid temperature is measured, but we have not been able to check it because of missing equipment – an exchanger.
On the contrary, it might make sense to keep the data within the scope of the study if we plan to check it eventually. If you prefer to use more immediate checks, limit the study to actionable parameters. Keeping all the parameters, regardless of their actionability, means that all possibilities are explored, and certain environmental conditions are taken into account. The scope can therefore be adjusted depending on the expected objective.
Statistically, it is considered that a large number of observations are required for the parameters analyzed to reap relevant results. In particular, avoid the detection of coincidental correlations that do not correspond to any physical reality. If this condition is not met, it is important to be careful when interpreting the results and validate correlations identified by expert knowledge and field tests.
We combine a Machine Learning algorithm using tree sets and a combinatorial method based on game theory to identify process influencing factors. The objective of this approach is to provide a robust, state-of-the-art method for processing this type of data and determining the participation of each variable in the process. The calculation time of these algorithms is efficient (excluding data recovery time):
- the number of parameters studied (number of columns in the dataset) has little impact;
- changes in calculation time are relatively linear according to the number of observations used (number of rows in the data set).
The advantage is robust quality for a certain number of points mentioned above:
- insufficient number of observations;
- effective recognition of influence from various parameters.
Choosing the right approach
The approach we recommend (knowing that others are possible):
- Spend time viewing data with visualization tools. Process quality subjects with interdependence and purpose. This investment makes the approach more efficient.
- First, study a limited scope of parameters. Focus on robust actionable data with no interdependence and a business purpose. It is possible to validate the degree of inconsistency explained by these parameters and therefore direct efforts to:
- controlling these parameters if they explain a significant part of the inconsistency; or to
- continuing the study.
- The next step is to expand the scope to increase the share of explained inconsistencies. Expand the scope by adding data of poorer quality, potentially actionable parameters, and eliminating interdependencies until complete.
Authors: Mathieu Cura, Christian Duperrier, Arthur Martel