Soft Sensors and their Machine Learning Models

Soft sensors, also known as virtual sensors, are software that can process hundreds of measurements in combination. Plant managers who want to add soft sensors may be overwhelmed by the scope of machine learning that enables soft sensors to do their work. Yet, a deep dive into the subject reveals something surprising: there are five core algorithms behind most soft sensor designs.

While the selection, training and implementation of these models is a job for a data scientist, plant managers and other operational experts will want to become familiar with their functions.

Understanding soft sensors

Soft sensors are created in a software environment but can provide the same benefits as their real-world counterparts. In some situations, soft sensors might be preferred over real ones.

Because of this, operations experts and data scientists should collaborate on designing a soft sensor for a variety of reasons. One is the desire for real-time, or near real-time, measurements of essential parameters necessary for a specific outcome. These measurements are vital to improving overall performance.

Other use cases for soft sensors include the following:

Plant staffing shortage. Some processes require laboratory personnel to sample or analyze parameters for a specific physical or chemical property. These could include viscosity, molecular weight and composition. Soft sensors can be used to estimate these values when there are not enough personnel to take the measurements.
Redundant sensor. In harsh environments, sensor fouling is normal. A soft sensor can provide readings in place of digital sensors and keep processes moving until the digital sensors can be replaced.
Additional sensors. Sometimes more sensors are needed or a process lacks its own sensors. In these situations, a soft sensor can mimic an identical asset that has all the correct sensors in place.

Models, uses and limitations

Machine learning exercises follow a circular pattern. First, data is prepared and cleaned. Next, a data scientist will select an algorithm to use as the basis for the model. Then, a data scientist will begin training the model with either unprocessed or preprocessed time-series and contextual data. Finally, the model is tested and deployed. The process then begins again to improve the model.

There are two types of models to choose from:

A supervised model, which requires a labeled dataset to make comparisons with other variables.
An unsupervised model, which is used primarily to describe the relationship between or among multiple variables.

In this image from TrendMiner, the output of a machine learning model tag is shown in comparison to actual tag measurements.

Of these, the supervised model is a better choice for developing soft sensors or creating predictive tags.

Although there are hundreds of supervised machine learning models, only a handful of them—from a category known as regression algorithms—are useful for creating soft sensors. Following is a description of each:

Linear Regression: This is one of the most useful and simplest ways to create a soft sensor. However, certain processes, such as measuring the viscosity of a polymer, are too complex for a linear regression. This algorithm generates a function that predicts the value of a target variable. It does so as a function of a linear combination of one or more variables. When using one variable, it is called a univariate linear regression. Multiple variables give it the name multivariate linear regression. The benefit of using this model is its clarity. It is easy to determine which of the variables have the greatest effect on the target. This is known as feature importance.

Decision Tree: Decision trees theoretically can have as many rules and branches as they need to fit the data. They use these rules from independent variables known as a group of features. The result is a piecewise constant estimate of the target value. Because they can have many rules and branches, they can be very flexible. On the other hand, they also carry a risk of overfitting the data. Overfitting happens when a model is trained for too long. This allows the model to adapt to noise in the dataset, which it begins to accept as normal. Underfitting the data also can occur. In that case, the algorithm is not trained long enough, and therefore does not have enough data to determine how independent variables might be related to the target variable, or what influence they may have on it. Both overfitting and underfitting the data lead to model failure. The model can no longer handle new data and cannot be used for a soft sensor. The concepts of overfitting and underfitting data are not unique to a decision tree model.

Random Forest: This is essentially a combination of multiple decision tree models in one model. It offers more flexibility to allow more features and it gives more predictive capabilities. However, it also carries a high risk of overfitting the data.

Gradient Boosting: In machine learning, gradient boosting is often referred to as an ensemble model. Like random forest, gradient boosting combines multiple decision trees. But it is different in that it optimizes each tree to minimize the loss function calculated at the end. These models can be very effective, but they become harder to interpret over time.

Neural Network: The concept known as deep learning is a neural network regression model. This model takes the input variables and, when applied to a regression problem, generates a value for the target variable. The most basic neural network is a multiplayer perceptron. In these models, only a single neuron arrangement is used. More often, a neural network will have an input layer, one or more hidden layers (each with multiple neurons) and an output layer for the value.

The value of the weighted inputs within each neuron inside the hidden layer are added up and passed through an activation function (such as the sigmoid function). This function makes the model non-linear. Once the function makes its way through the model, it reaches the output layer containing a single neuron. Weights and biases that best fit the features and target values are determined while the model is being trained.

Collaborative design

A common misconception for those new to machine learning is that there will be a correct model that will fit a certain need. That is not the case. Choosing one model over another is a complex decision that is partially based on the experience of the data scientist.

Furthermore, none of these supervised regression models will produce the same results each time. Therefore, there is no such thing as a “best” model, but some might suit certain situations better.

Collaboration between data scientists and operations experts on any machine learning exercise begins with a mutual understanding of the parameters involved, the targeted use and the method of development and deployment.

With a solid understanding of these algorithms, engineers can make important contributions to soft sensor design.

Eduardo Hernandez is customer success manager at TrendMiner.