Flotation App Data Quality 3.1

During an operational shift in the plant, some process issues may arise that can affect the identification of valuable data for production. If any measuring equipment in the flotation circuit fails or does not transmit data correctly, there may not be sufficient visibility to propose parameters for process improvement.

For this purpose, the concept of Data Quality has been established, which involves processes or procedures conducted on raw data to identify any discrepancies, errors, or inconsistencies in the data.

In summary, Data Quality refers to the practices and processes involved in verifying and enhancing the reliability of raw data, including the use of metrics and rules to address data discrepancies and maintain data integrity.

Metrics and Indices: Each data point in the application is identified by an index that records when the information was recorded, allowing for temporal analysis. Data is divided into independent blocks for more precise analysis, with each block containing a specific number of data points.
Anomaly Detection: Each data block is analyzed to identify anomalies, which are data points that stand out as different or unusual compared to others. The number of anomalies in each block and their duration are recorded.
Badness Score: Based on the identified anomalies, a "data badness score" is calculated for each block. The higher the score, the worse the data quality in that block.
Weighting of Past and Recent Times: The importance of data can vary depending on when it was recorded. Therefore, more weight is given to recent data, as it is considered more relevant.
Final Score: By summing up the badness scores of all blocks, taking into account the temporal weighting, a final score is obtained that reflects the overall data quality.

In its version 3.0, the concept of Run Conditions based on fallback values is being implemented. If any valuable input data for the model fails, whether due to a misread from uncalibrated, offline, or faulty equipment, the system immediately replaces this missing data with a fallback value based on what reflects the production of the specific site.

Using our example, this could mean that there was some inconsistency in the input data for pulp level, such as the measurement and control instrument malfunctioning at that moment. Consequently, the input value would drop to zero during the periods when this occurred, what is depicted by the blue line in the graph below:

In the example below, we demonstrate how this concept works. The chart illustrates Data Quality identification. Here, we will consider an input data related to pulp level. The highest peak on the charts corresponds to 2.1, where the Data Quality 3.0 concept is applied.

From the moment the data quality error is triggered, indicating 2.1, the fallback values are also activated to ensure that even when the model receives null input data, it provides values in accordance with the client's production and keeps the model operational. Next, we will present the behavior of the curves with the activation of fallback values:

The red line represents the moment when the fallback value of 33.3% is applied to the pulp level to feed back into the model, even though the input data in the field showed a value of zero. This ensures there is no interruption in the information generated by the Flotation App due to failures in the measurement field devices.

To track the status of Data Quality, the system generates Data Quality Status Metrics as output. These metrics serve as indicators of whether a data quality error was present and which specific fallback strategy was employed when a fallback value was assigned. The metrics are coded as follows:

0 - Data is in good quality: This is the first level of metric and indicates that the data is in compliance, and no data quality errors have been detected. In this case, no fallback value has been assigned, and the data is considered reliable and usable by the system.

1 – Warning: This is the second level of metric and signals that there is some potential issue with data quality, but it's not a critical error that would block the system's operation. It may indicate that a specific fallback strategy has been applied to address data quality issues, such as missing or unreliable values.

2 - Blocking error: This is the third level of metric and indicates that a critical error in data quality has occurred, disrupting the normal operation of the system. In this case, a fallback strategy has been employed to ensure that the system can continue to operate effectively even when primary data sources are unavailable.

2.1 – Fallback: This strategy involves assigning a float or another input metric as a fallback value when valuable input data for the model fails. It ensures that the model can continue to operate effectively even when primary data sources are unavailable.

2.2 – Clipping: In cases where the raw input value is either a nan or below the limit defined in metrics.json, the clipping strategy assigns the lower limit of the data quality range as the fallback value. Conversely, when the raw input value exceeds the limit, it assigns the upper limit value as the fallback. This approach helps maintain data within acceptable quality boundaries.

2.3 - Metric Average: This strategy calculates the average of defined input variables as fallback values. It provides a way to generate meaningful substitute data when individual data points are missing or unreliable.

Note that a Global Data Quality score is now output too, which calculates the Fraction of Data Quality Statuses that are < 2 for active cells/lines.