Identifying process influencing factors has several benefits:
These two approaches are often in opposition when identifying influential factors. Should all the data or as much as possible be taken into account, or should the data be passed through an “expert” filter to limit the scope of the study. The two approaches address different priorities:
Both arguments are sound and should not be opposed.
It is important to go back to the objective to get the right balance. The objective is to identify process influencing factors that are not necessarily understood, but which have been observed and provide relevant information about the process in question. There are, however, some inevitable constants used for mathematical and statistical models in this case. The more factors studied, the higher the risk of:
This is what “robust” means. Algorithms are used to root out inconspicuous correlations, but study results must also be as consistent with reality as possible, and regardless of the data set being studied.
There is no point using poor quality data: eg. a damaged sensor or a sensor with serious calibration differences over time. That disrupts rather than improves the process.
This aspect can, however, be controlled over time by applying a data quality approach: such as metrology monitoring or measuring with accuracy and noise levels compatible with the planned use of the data. This extends the scope of the approach.
Parameters often evolve in a coordinated manner due to the nature of the process observed. In this case, the natural correlation of parameters reduces model sensitivity. The estimated influence is likely to be diluted between these different factors. For example, for equipment such as evaporators, a number of measured parameters – temperatures, pressures, etc. – are completely interdependent because of the laws of thermodynamics. Only a few need to be controlled to set the value of the others.
Knowledge about the process being studied, as well as preliminary assessments of these correlations, makes it possible to identify groups of interdependent parameters and to select those that appear the most relevant.
Not all parameters can be controlled. Depending on the desired outcome, it may be possible to remove parameters that are not actionable and that cannot become actionable without changes to process or investment. For example, a fluid temperature is measured, but we have not been able to check it because of missing equipment – an exchanger.
On the contrary, it might make sense to keep the data within the scope of the study if we plan to check it eventually. If you prefer to use more immediate checks, limit the study to actionable parameters. Keeping all the parameters, regardless of their actionability, means that all possibilities are explored, and certain environmental conditions are taken into account. The scope can therefore be adjusted depending on the expected objective.
Statistically, it is considered that a large number of observations are required for the parameters analyzed to reap relevant results. In particular, avoid the detection of coincidental correlations that do not correspond to any physical reality. If this condition is not met, it is important to be careful when interpreting the results and validate correlations identified by expert knowledge and field tests.
We combine a Machine Learning algorithm using tree sets and a combinatorial method based on game theory to identify process influencing factors. The objective of this approach is to provide a robust, state-of-the-art method for processing this type of data and determining the participation of each variable in the process. The calculation time of these algorithms is efficient (excluding data recovery time):
The advantage is robust quality for a certain number of points mentioned above:
The approach we recommend (knowing that others are possible):
Authors: Mathieu Cura, Christian Duperrier, Arthur Martel