Volume 17 Number 1
February 2020
Article Contents
Chang-Hao Zhu and Jie Zhang. Developing Soft Sensors for Polymer Melt Index in an Industrial Polymerization Process Using Deep Belief Networks. International Journal of Automation and Computing, vol. 17, no. 1, pp. 44-54, 2020. doi: 10.1007/s11633-019-1203-x
Cite as: Chang-Hao Zhu and Jie Zhang. Developing Soft Sensors for Polymer Melt Index in an Industrial Polymerization Process Using Deep Belief Networks. International Journal of Automation and Computing, vol. 17, no. 1, pp. 44-54, 2020. doi: 10.1007/s11633-019-1203-x

Developing Soft Sensors for Polymer Melt Index in an Industrial Polymerization Process Using Deep Belief Networks

Author Biography:
  • Chang-Hao Zhu received the B. Sc. degree in mechanical engineering from Shandong University, China in 2014, and the M. Sc. degree in electrical power from Newcastle University, UK in 2016. Currently, he is a Ph. D. degree candidate in chemical engineering at the School of Engineering, Newcastle University, UK. His research interests include process control, machine learning, development of data driven soft sensor and their applications industrial chemical processes. E-mail: c.zhu5@newcastle.ac.ukORCID iD: 0000-0003-1801-0787

    Jie Zhang received the B. Sc. degree in control engineering from Hebei University of Technology, China in 1986, and the Ph. D. degree in control engineering from City University, UK in 1991. He is a Reader in the School of Engineering, Newcastle University, UK. He has published over 300 papers in international journals, books and conferences. He is a senior member of IEEE, a member of the IEEE Control Systems Society, IEEE Computational Intelligence Society, and IEEE Industrial Electronics Society. He is on the Editorial Boards of a number of journals including Neurocomputing published by Elsevier. His research interests are in the general areas of process system engineering including process modelling, batch process control, process monitoring, and computational intelligence. E-mail: jie.zhang@newcastle.ac.uk (Corresponding author)ORCID iD: 0000-0002-9745-664X

  • Received: 2019-03-15
  • Accepted: 2019-09-18
  • Published Online: 2019-11-05
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures (12)  / Tables (8)

Metrics

Abstract Views (696) PDF downloads (45) Citations (0)

Developing Soft Sensors for Polymer Melt Index in an Industrial Polymerization Process Using Deep Belief Networks

Abstract: This paper presents developing soft sensors for polymer melt index in an industrial polymerization process by using deep belief network (DBN). The important quality variable melt index of polypropylene is hard to measure in industrial processes. Lack of online measurement instruments becomes a problem in polymer quality control. One effective solution is to use soft sensors to estimate the quality variables from process data. In recent years, deep learning has achieved many successful applications in image classification and speech recognition. DBN as one novel technique has strong generalization capability to model complex dynamic processes due to its deep architecture. It can meet the demand of modelling accuracy when applied to actual processes. Compared to the conventional neural networks, the training of DBN contains a supervised training phase and an unsupervised training phase. To mine the valuable information from process data, DBN can be trained by the process data without existing labels in an unsupervised training phase to improve the performance of estimation. Selection of DBN structure is investigated in the paper. The modelling results achieved by DBN and feedforward neural networks are compared in this paper. It is shown that the DBN models give very accurate estimations of the polymer melt index.

Chang-Hao Zhu and Jie Zhang. Developing Soft Sensors for Polymer Melt Index in an Industrial Polymerization Process Using Deep Belief Networks. International Journal of Automation and Computing, vol. 17, no. 1, pp. 44-54, 2020. doi: 10.1007/s11633-019-1203-x
Citation: Chang-Hao Zhu and Jie Zhang. Developing Soft Sensors for Polymer Melt Index in an Industrial Polymerization Process Using Deep Belief Networks. International Journal of Automation and Computing, vol. 17, no. 1, pp. 44-54, 2020. doi: 10.1007/s11633-019-1203-x
    • Much work on soft sensors in the research area of process control has done in the past few decades. This technique is widely implemented in industrial chemical processes. Soft sensors are very effective techniques for the estimation of some important product quality variables in industrial processes which cannot be measured effectively. In the area of process control, the issues of hardware instruments, such as unavailability or high cost, hinder the product quality control. To overcome these issues, empirical models can be developed based on process operational data obtained from real industrial processes. With such models, the difficult-to-measure quality variables can be estimated from easy-to-measure process variables[1]. This modelling technique based on historical process data has become increasingly popular in chemical processes in recent years. Such data driven models can be effectively used to reduce the cost of production in industrial processes and improve the efficiency.

      Much successful research on process modelling based on multivariate statistical techniques has been completed in the last century. In 1901, principal component analysis (PCA) was proposed by Pearson[2]. This method was further developed by Harold Hotelling in the 1930s[3, 4]. Based on PCA, principal component regression (PCR) and partial least squares (PLS) have emerged as useful modelling methods to address the problem of co-linearity among the input variables[2]. Data-driven soft sensors based on PCR can be developed by using principal components as the predictor variables. As an improvement of PCR, PLS regression can model both the process data and quality data at the same time[5]. Wold et al.[6] first introduced PLS and Wold further developed it. There were many applications based on the PLS technique in process modelling. One limitation of PLS and PCR is that they are both linear techniques. They are not very effective when applied to nonlinear process modelling.

      With the development of machine learning, many researches on developing soft sensors based on machine learning techniques have been reported in the past a few years. There are many successful process modelling techniques based on machine learning, such as support vector machine (SVM) and artificial neural networks (ANN). McCulloch and Pitts[7] proposed the original neural network in the 1940s. After 20 years, with the vast improvement of computer capability, neural networks became a popular research topic. The back-propagation algorithm was applied to ANN by Werbos[8] in 1975. The advantage of ANN is that they can be used to approximate any nonlinear functions. ANN gives very good performance on estimation and prediction of quality data. The backpropagation algorithm can deal with the exclusive-or problems. In the backpropagation training algorithm, the network weights between neurons are modified to distribute the errors back up from the output layer[8]. However, conventional ANN suffers from problems of local optima and lack of generalization capability. SVM can achieve accessible optima of training even when there is little training data[9]. However, when applying SVM to processes with large amount of modelling data, the pressure of computation will increase. In 2006, Hinton et al.[10] first introduced deep learning. Deep belief network (DBN) is one kind of the most well-known data-driven modelling techniques based on deep learning. It shows strong generalization capability in modelling highly nonlinear processes. This model is established with a deep architecture. Deep learning has many applications in speech recognition and images classification[11]. There are two training phases in the procedure of DBN training: unsupervised training followed by supervised training. Before supervised training, DBN will capture more information from nonlinear process input data to achieve more accurate prediction or estimation of quality data. It has shown significant performance in many other applications[12, 13].

      Soft sensors for polymer melt index (MI) are established using DBN and applied to an industrial polypropylene polymerization process in this study. By using deep learning techniques, large amount of industrial process data samples without pre-existing labels can also be used by DBN models in the unsupervised training phase. However, these input data are useless for training the conventional feed-forward neural networks which just use supervised training. These process data samples help the DBN model in adjusting the weights in a desirable region. The information from process data were captured during the procedure of unsupervised training. It is shown in this paper that DBN models gave very accurate estimations of MI.

      The rest of this paper is organized as follows. An introduction of ANN is given in Section 2. In Section 3, DBN model and the main principles of restricted Boltzmann machines (RBMs) and back-propagation are introduced. Section 4 introduces the case study of an industrial polypropylene polymerization process. The selection of DBN model architectures are discussed and the polymer melt index estimation results are given in Section 5. Section 6 summarizes the conclusions of this paper.

    • The feed-forward neural network is one of the most well-known machine learning techniques. It can be used in solving many problems of prediction, classification, and pattern recognition. Much research of ANN has been reported in the past decades. In the initial form of simple perceptron invented by McCulloch and Pitts, the model calculates the weighted sum of input variables and then passes it to an activation function. Fig. 1 shows a simple perceptron structure.

      Figure 1.  Model of simple perceptron

      As can be seen from Fig. 1, $ {x_i}, \; i = 1,2, \cdots ,n$, are input variables and $ {w_j},\;j = 1,2, \cdots ,n$, are the corresponding weights for these input variables. McCulloch and Pitts[7] used the threshold function as the activation function. They proved universal computations can be performed by simple perceptrons if weights are chosen appropriately. However, a lot of complicated systems cannot be represented by this method[14]. Many other activation functions can be used, such as Heaviside step function, sigmoid function and Gaussian function. These activation functions are sometimes also named as transfer function in ANN research. The most popular activation function is the sigmoid function.

      The characteristic of the sigmoid function is that it is an “S”-shaped curve as shown in Fig. 2.

      Figure 2.  “S”-shaped curve of sigmoid function

      The sigmoid function maps its input values into a region from 0 to 1. In Fig. 2, the output value approaches to 1 when x approaches to + ∞, whereas the output approaches to 0 when x approaches to – ∞. It has the appropriate asymptotic properties. The sigmoid function is given by (1):

      $ S\left( x \right) = \frac{1}{{1 + {{\rm e}^{ - \beta x}}}} $

      (1)

      where x represents the sum of weighted input values and β is a slope parameter.

      The structure of ANN can be regarded as neurons arranged into inter-connected layers with weighted connections between neurons in adjacent layers. Basically, feed-forward neural networks and recurrent networks are principally two types of ANN. Feed-forward networks have no feedback connections from the network outputs. Recurrent neural networks are a type of neural networks with feedback connections. In this work, feed-forward networks are also used for soft sensor development in the polypropylene polymerization process for the purpose of comparison with DBN.

      Multilayer perceptrons are the most classic type of feed-forward networks. They can deal with more complicated problems than simple perceptrons. Neurons in the adjacent layers are connected unidirectionally without feedback loops. A multilayer perceptron model has at least three layers. It is commonly constructed by an input layer, a hidden layer and an output layer. The relationship between the neural network input and output variables can be learnt during the process of supervised training and stored as the trained network weights. The structure of a multilayer perceptron with two hidden layers is shown in Fig. 3.

      Figure 3.  Multilayer perceptron

      Each unit in the input layer is a network input. The output of a unit in the hidden or output layer is calculated by passing the sum of the weighted outputs of the previous layer to an activation function as follows:

      $ {O_j} = f\left( {\mathop \sum \limits_{i = 1}^n {w_{ij}}{I_i} + {b_j}} \right) $

      (2)

      where Oj is the output value of the unit j in a particular layer, wij is the weight between this unit and the i-th unit of the immediate previous layer, Ii is the i-th input of this unit (i.e., the output value of the i-th unit in the previous layer), bj is a bias, and f is the activation function. During network training, the weights and biases will be initialized as random values typically in a range between –0.1 and 0.1. Network weights are adjusted by using training algorithms to minimize the error terms between network outputs and target labels. After training, the relationship between system input variables and output variables can be represented by the trained neural networks.

      The training process of a multilayer feedforward neural network is supervised training. The most commonly used supervised training algorithm is the backpropagation algorithm. Multilayer feedforward neural networks have the capability of modeling nonlinear processes. However, the process of polymerization is highly nonlinear. The structure of commonly used multilayer neural networks is shallow. When a feedforward neural network with more than three layers is training by backpropagation, the model always suffers from the problem of poor generalization. This modelling technique cannot meet the demand of the accuracy of the estimation. To achieve more accurate estimation of MI, DBN models are established in this study. DBN has a deep architecture and stronger generalization capability.

    • The limitation of traditional neural networks is that they usually have shallow structures. There are typically no more than three layers in a conventional neural network model. With this limitation, a neural network with shallow structure may not achieve satisfactory estimation performance when applied to highly nonlinear industrial processes. The actual industrial processes are commonly highly nonlinear. The shallow architecture of feed-forward neural networks could lead to the lack of representation capability[15, 16]. To approximate various regions of processes, the model needs more hidden neurons added to the hidden layers. It is suggested in recent research that networks with a deep structure can achieve reliable results[15]. DBN has been successfully applied to many research areas, such as classification and recognition[17]. In a DBN model, several restricted Boltzmann machines (RBMs) can be stacked and combined as one learning network. DBN is developed with a deep structure based on a deep learning technique. Fig. 4 presents the basic architecture of DBN.

      Figure 4.  Architecture of DBN

      The DBN shown in Fig. 4 has five layers, an input layer, an output layer and three hidden layers. In Fig. 4, W is the weight of the network, b and c are biases of the network. It can be considered that DBN is a combination of stacking RBMs. Each hidden layer of DBN is regarded as one single RBM. Compared with the traditional Boltzmann machine, the neurons in a hidden layer of DBN are not connected to each other. However, the layers in a network have symmetrical connections with each other. The units in hidden layers are binary units and the visible input layer units are Gaussian units. The first phase of training is unsupervised training and the process operational data are used to train the DBN model without any target variables involved. The unsupervised training helps the DBN mine more correlations than the feed-forward neural networks. The weights are adjusted in a desired region before the supervised training phase. After unsupervised training, DBN is fine-tuned by the backpropagation algorithm in the supervised training phase.

    • In the 1980s, Smolensky[18] developed the restricted Boltzmann machine. Hinton et al.[10] developed DBN by stacking RBMs as the layers of DBN. A DBN contains stacked RBMs as shown in Fig. 4.

      To understand the basics of RBM, the probability function between visible units and hidden units need to be introduced at first. Equation (3) shows the probability function

      $ P\left( {{ v},{ h}} \right) = \frac {{{\rm{exp}}\left\{ { - {Energy}\left( {{ v},{ h}} \right)} \right\}}}Z $

      (3)

      where Z represents a normalizing factor, v represents the vector of visible layer, h represents the vector of hidden layer. The probability P(v, h) increases when the energy function decreases. In the RBM, the energy function is given by

      $ Energy\left( {{ v},{ h}} \right) = - {{ b}^{\rm T}}{ v} - {{ c}^{\rm T}}{ h} - {{ h}^{\rm T}}{ W}{ v} $

      (4)

      where W, b and c are parameters of the function. It should be noted that the vector v and the vector h are both binary-valued. Binary RBMs are used as hidden layers in a DBN model. However, they cannot be used to deal with continuous variables. To overcome this issue, (4) can be extended to energy function of Gaussian RBM:

      $ Energy\left( {{ v},{ h}} \right) = \mathop \sum \limits_i \frac{{{{\left( {{v_i} - {a_i}} \right)}^2}}}{{2\sigma _i^2}} - {{ c}^{\rm T}}{ h} - {{ h}^{\rm T}}{ W}{ v} $

      (5)

      where ai is the mean of Gaussian distribution, σi is the standard deviation of Gaussian distribution for input neuron. The samples of input data are commonly normalized to zero mean and unit variance in practical applications. Therefore, (5) can be changed to

      $ Energy\left( {{ v},{ h}} \right) = \frac{1}{2}{{ v}^{\rm T}}{ v} - {{ b}^{\rm T}}{ v} - {{ c}^{\rm T}}{ h} - {{ h}^{\rm T}}{ W}{ v}. $

      (6)

      Hintons[19] also described other forms of RBM, but the DBN in this paper only uses Gaussian RBM and binary RBM.

    • The objective of training RBM is to maximize the probability P(v), which can be achieved by minimizing the energy function. From Gibbs sampling, h can only be sampled from v of visible layers. Based on the previous work, the gradient at a visible point v can be formulated as

      $ \begin{split} & \frac{{\partial{\rm log} P\left( { v} \right)}}{{\partial \theta }} = \frac{{\partial {\rm log} \displaystyle\mathop \sum \limits_h P\left( {{ v},{ h}} \right)}}{{\partial \theta }} =\\ &\quad\quad \frac{{\displaystyle\mathop \sum \limits_h {{\rm e}^{ - Energy\left( {{ v},{ h}} \right)}}\left( {\dfrac{{\partial \left[ { - Energy\left( {{ v},{ h}} \right)} \right]}}{{\partial \theta }}} \right)}}{{\displaystyle\mathop \sum \limits_h {{\rm e}^{ - Energy\left( {{ v},{ h}} \right)}}}} - \\ &\quad\quad\frac{{\displaystyle\mathop \sum \limits_{{\tilde{ v}}} \mathop \sum \limits_{ h} {{\rm e}^{ - Energy\left( {{\tilde{ v}},{ h}} \right)}}\!\!\left( {\dfrac{{\partial \left[ { - Energy\left( {{\tilde{ v}},{ h}} \right)} \right]}}{{\partial \theta }}} \right)}}{{\displaystyle\mathop \sum \limits_{{\tilde{ v}}} \mathop \sum \limits_{ h} {{\rm e}^{ - Energy\left( {{\tilde{ v}},{ h}} \right)}}}}\!=\\ &\quad\quad \mathop \sum \limits_h P\left( {{ h}{\rm{|}}{ v}} \right)\frac{{\partial \left[ { - Energy\left( {{ v},{ h}} \right)} \right]}}{{\partial \theta }} -\\ &\quad\quad \mathop \sum \limits_{\tilde v} \mathop \sum \limits_h P\left( {{\tilde{ v}},{ h}} \right)\frac{{\partial \left[ { - Energy\left( {{\tilde{ v}},{ h}} \right)} \right]}}{{\partial \theta }} \end{split} $

      (7)

      where $ {\rm{\theta }} = \left\{ {{ W},{ b},{ c}} \right\}$ is a vector of the network parameters. Computing the positive term in (7) is easy because the vector v has been known. Calculating the negative term in (7) becomes intractable. The contrastive divergence is a useful method to overcome the issue of calculating second-order approximation of the negative term and it offers an effective solution[20, 21]. The process of training RBM starts with training vectors on the visible units. Then hidden units $ {{ h}^{\left( { t} \right)}}$ can be generated from $ { v}{ ^{\left( {{ t} - {\bf{1}}} \right)}}$ by Gibbs sampling and update visible units $ {{ v}^{\left( { t} \right)}}$ from $ {{ h}^{\left( { t} \right)}}$. It is named as a Markov chain. After infinite iterations of Gibbs sampling, the visible units v(∞) and hidden units h(∞) are sampled. The correlation of v(∞) and h(∞) can be measured after sampling for a long time. However, in practical situations, just one iteration of Gibbs sampling can achieve a satisfactory result and the learning algorithm works well.

    • Back-propagation is the most commonly used supervised training approach to train neural networks. After the unsupervised training phase, the back-propagation algorithm will fine tune the whole network in the supervised training phase. The errors between the network outputs and the corresponding labels are computed and back propagated to the previous layer. Equation (8) shows the error terms

      $ { E}{ r}{{ r}_j} = {{ O}_j}\left( {1 - {{ O}_j}} \right)\left( {{{ T}_j} - {{ O}_j}} \right) $

      (8)

      where Oj represents the network output for a training sample, Tj is the corresponding target value for the j-th output neuron. The error term of hidden layers is formulated as

      $ { E}{ r}{{ r}_j} = {{ O}_j}\left( {1 - {{ O}_j}} \right)\mathop \sum \limits_k { E}{ r}{{ r}_k}{{ w}_{jk}} $

      (9)

      where wjk is the vector of weights connecting output layer and the last hidden layer, Errk is the error term of output layer. During training, the weight updating is transferred from the output layer to the input layer. The formulas of weight updating are given as

      $ {{ w}_{ij}} = {{ w}_{ij}} + \eta { E}{ r}{{ r}_j}{{ O}_i} $

      (10)

      $ {{ c}_j} = {{ c}_j} + \eta { E}{ r}{{ r}_j}\quad\quad $

      (11)

      where η is the learning rate of the training process, wij and cj are the vectors of weights and bias respectively. The learning rate needs to be properly selected. A large learning rate may miss the minimum whereas a small learning rate usually leads to slow training speed.

      As described earlier, the training of DBN contains an unsupervised training phase and a supervised training phase. The initial weights are adjusted to an appropriate region in the unsupervised training procedure. The whole network is then fine-tuned by backpropagation in the supervised training phase to achieve accurate modelling results. The profuse latent information extracted from input variables during the unsupervised training is more interpretable. This semi-supervised method improves the robustness and generalization capability of model with a deep architecture.

    • Advanced monitoring, control, and optimization techniques are essential in modern industrial chemical processes to overcome the issue of high cost and improve the efficiency of production[22]. In this paper, DBN is used to develop soft sensors for a polypropylene production plant in China. In this plant, two continuous stirred tank reactors (CSTR) and two fluidized-bed reactors (FBR) are used to produce polypropylene as shown in Fig. 5. Propylene, hydrogen, and catalyst are fed to reactors. Reactants for the growing polymer particles are these gases and liquids fed to the reactors. They are also the provider of the heat transfer media. The melt index of polymer is a key polymer quality variable and should be closely monitored and controlled. MI of polypropylene depends on many factors like catalyst, reactor temperature and concentration of the reaction materials. For example, hydrogen can increase the polymerization rate of polypropylene. It mainly increases the initial polymerization rate of propylene[23]. The hydrogen concentration regulates the weight of polypropylene. Hydrogen can also delay the decay rate of the catalyst. Due to the difficulty of measuring polymer MI in this process, the relationship between MI and some process variables which can be measured easily during the process needs to be found. The inferential estimation of MI can be obtained by soft sensors. As this industrial process is very complicated, it is difficult to develop first principle models linking polymer MI with easy-to-measure process variables. Therefore, nonlinear data-driven models need to be utilised in developing soft sensors for this process.

      Figure 5.  Propylene polymerization process

      The polypropylene grades are related to some key variables, such as reactant composition, reactor temperature and catalyst properties. The feedstock of D201 are propylene, hydrogen and catalyst. The co-monomer is added to D204. Several grades of polymers were produced within one month. Industrial process operational data covering this time period are available for this application. In this process, polymer MI were logged every two hours and process samples were logged every half hour. In fact, MI are only highly relevant to a few process variables. Based on the research of Zhang et al.[24], there are strong correlations between MI of polymer in reactor D204 and hydrogen concentration in reactor D201 and reactor D202. MI of polymer in reactor D201 is highly relevant with the hydrogen concentration and feed rate in reactor D201[24]. Concentration of hydrogen in D201 and D202, feed rate of hydrogen and MI of polypropylene in reactor D201 and D204 are shown in Figs. 6- 8, respectively. Due to the industrial confidentiality, the units of these variables are not disclosed.

      Figure 6.  Concentration of hydrogen in (a) D201 and (b) D202

      Figure 8.  Melt index in (a) D201 and (b) D204

      Figure 7.  Feed rate of hydrogen

      From Fig. 8, it can be observed that the MI data cover quite a wide range. Thus, the data are suitable for developing data-driven models. Soft sensors should extract the information from limited process data and quality data to obtain accurate estimation of MI. From the trends displayed in Figs. 6-8, it can be seen that MI is highly correlated with hydrogen feed rate and concentration.

      The time delay of the industrial process can be found based on the cross-correlations analysis[24]. The data-driven model for inferential estimation of MI can be represented as

      $ \begin{split} M{I_1}\left( t \right) =\,& {f_1}[{H_1}\left( t \right),{H_1}\left( {t - 1} \right),{H_1}\left( {t - 2} \right),F\left( {t - 9} \right),\\ & F\left( {t - 10} \right),F\left( {t - 11} \right)] \end{split} $

      (12)

      $ \begin{split} M{I_2}\left( t \right) = \,& {f_2}[{H_1}\left( {t - 7} \right),{H_1}\left( {t - 8} \right),{H_1}\left( {t - 9} \right),\\ & {H_2}\left( {t - 6} \right),{H_2}\left( {t - 7} \right),{H_2}\left( {t - 8} \right)] \end{split} $

      (13)

      where MI1 and MI2 are the MI in D201 and D204, respectively, H1 and H2 are the concentrations of hydrogen in D201 and D202, respectively, and F is the hydrogen feed rate to D201.

      The original process data set contains 1 534 samples of process operational data and 383 samples of quality data (MI) which are available for the establishment of data driven DBN models. It indicates that the amount of process variable samples is larger than the amount of quality variable samples. There are only 383 samples of process variables that have corresponding quality variables. However, the rest of process variable samples can be utilized by DBN in the unsupervised training phase. By such means, DBN can capture much valuable information from process data. The estimation of MI achieved by DBN can be improved.

      The data set for supervised training phase were separated into a training data set, a testing data set and an unseen validation data set. The partition of data sets for estimating MI1 is presented by Table 1. The partition of data sets for estimating MI2 is presented by Table 2.

      Data setsPercentageNumber of samples
      Training data50%192
      Testing data22%85
      Unseen validation data28%106

      Table 1.  Partition of data sets for estimating MI1

      Data setsPercentageNumber of samples
      Training data52%200
      Testing data18%68
      Unseen validation data30%115

      Table 2.  Partition of data sets for estimating MI2

      The selections of model structure can be determined by the training data set and testing data set through cross-validation. The unseen validation data are useful to test the performance of the final developed DBN model.

      It can be seen from Tables 1 and 2, 277 samples of training and testing variables were selected to fine tune DBN by backpropagation for MI1 and DBN only use 268 samples of training and testing variables to fine tune DBN in supervised training phase for MI2. During the unsupervised training phase of DBN models, only input data are required and target values are not required. Those input data samples without the corresponding output data are named as “unlabeled” process data. Therefore, in the unsupervised training phase of DBN models, samples of process variables without the corresponding MI data can also be utilized. However, those “unlabeled” process variables could not be used by other traditional neural networks for inferential estimation of product quality. For comparison, conventional neural network models were also developed.

    • The model structures need to be determined first. In this study, 25 DBN models with different architectures were developed and compared to each other. The one giving the best performance on the testing data set was regarded as having the appropriate structure. These DBN models have one visible layer (input layer), one additional top layer (output layer) and two hidden layers. The learning rate in the unsupervised training phase is selected as 0.01. The learning rate in the supervised training phase is 0.001 5, The structures of 25 DBN models are shown in Table 3. Figs. 9 and 10 present the sum of squared errors (SSE) on the training data set and testing data set, respectively for these 25 DBN models for estimating MI1.

      No.Neurons in 1st hidden layerNeurons in 2nd hidden layerNo.Neurons in 1st hidden layerNeurons in 2nd hidden layer
      1211487
      2221599
      3331698
      432171010
      54418109
      643191111
      755201110
      854211212
      966221211
      1065231313
      1177241312
      1276251413
      1388

      Table 3.  DBN models with different structures

      Figure 9.  SSE on training data for estimating MI1

      Figure 10.  SSE on testing data for estimating MI1

      From Figs. 9 and 10, the 7th DBN model gives the best generalization performance on the testing data set. The 6th DBN model gives the second lowest value of error on the testing data set. The 12th to the 25th DBN models have lower training errors than the 7th DBN model. However, those models give larger testing errors than the 7th DBN model. Thus, the 12th to the 25th DBN models are likely have suffered from overfitting and their structures are not appropriate to be selected. From the results given by Figs. 9 and 10, the number of neurons in the first hidden layer can be considered as 5. From Table 3, it can be seen that these 25 DBN models have close numbers of neurons in the first and second hidden layers. The first step of this investigation is to confirm that the 7th DBN gave the best performance among these 25 DBN models. To avoid the situation that some DBN models not included in Table 3 might give better performance, the second step is to further investigate the number of neurons in the second hidden layer. Nine additional DBN models with neurons in the second hidden layer ranging from 2 to 10 are constructed. The values of error terms on the training and testing data of these DBN models are shown in Table 4.

      No.Neurons in 1st
      hidden layer
      Neurons in 2nd
      hidden layer
      SSE
      (training)
      SSE
      (testing)
      1520.756 20.581 9
      2530.820 40.619 3
      3540.782 40.594 5
      4550.769 60.511 8
      5560.820 60.577 3
      6570.727 10.574 2
      7580.672 30.585 9
      8590.762 80.607 1
      95100.737 20.632 2

      Table 4.  Errors of DBN models with different structures for estimating MI1

      From Table 4, it can be seen that the training error of the 7th DBN is the smallest. However, its testing error is not the smallest. The testing errors from the 6th to the 9th DBN increased. Therefore, the estimation on testing data obtained by the 6th to the 9th DBNs are overfitted. The 4th DBN (i.e., the 7th DBN model in Table 3) has the lowest testing error among all DBN models. This indicates that the 4th DBN model has better performance than other models and its structure should be adopted.

      In order to demonstrate the advantage of using those input data samples without corresponding target values as additional training data in the unsupervised training phase, a DBN model trained only using the input data samples with the pre-existing labels in the unsupervised training phase was also developed. This is represented by DBN No. 1 in Table 5, where DBN No. 2 was built by using “unlabeled” process data without corresponding MI samples as well. DBN No. 2 in Table 5 is in fact the 4th DBN model in Table 4. The two DBN models in Table 5 have the same structure. It can be seen from Table 5 that the first DBN model has larger SSE values on the training, testing and validation data set than the second DBN model. Therefore, DBN can extract more features from the “unlabeled” data. DBN No. 2 gives better performance than DBN No. 1.

      DBN No.SSE (training)SSE (testing)SSE (validation)
      11.620 30.890 50.702 4
      20.769 60.511 80.685 1

      Table 5.  Errors of DBN models for estimating MI1 with different input data

      Seven conventional single hidden layer feedforward neural network models were also established for the purpose of comparison. The SSE values of these conventional feedforward neural networks with different structures on the training and testing data are given in Table 6. From Table 6, the 4th neural network has the lowest SSE on the testing data set for estimating MI1 and the 3rd neural network has the lowest SSE on the testing data for estimating MI2.

      No.Neurons in
      hidden layer
      MI1MI2
      SSE (training)SSE (testing)SSE (training)SSE (testing)
      121.325 60.744 61.602 50.685 5
      230.794 90.822 11.518 50.737 4
      340.792 40.652 71.503 50.656 4
      450.767 50.632 31.365 00.688 3
      560.634 70.653 21.100 90.821 4
      670.512 41.105 40.784 40.830 5
      780.420 10.889 50.510 81.602 4

      Table 6.  Errors of neural networks with different structures

      The estimations of MI1 on the unseen validation data by DBN and the conventional feedforward neural network are shown in Fig. 11. In Fig. 11, the solid, dashed, and dotted lines represent, respectively, the actual values of MI1, the estimations by DBN, and the estimations by the conventional feedforward neural network. It can be seen from Fig. 11 that the estimations by the DBN model are generally closer to the corresponding actual values of MI1 than those by the feedforward neural network. The SSE values of both DBN and neural network are presented in Table 7. It can be seen from Table 7 that the SSE of DBN on training data set is larger than that of the neural network. However, the SSE values of DBN on testing and unseen validation data set are much smaller than those of the neural network. The strong generalization capability of DBN was proved by the inferential estimation of MI1. It gave better performance than the feed-forward neural network. The profuse latent information from process data were extracted by DBN during the unsupervised training phase. Overall, the DBN model gives more accurate estimations of MI1.

      ModelsSSE (training)SSE (testing)SSE (validation)
      Neural network0.767 50.632 30.824 3
      DBN0.769 60.511 80.685 1

      Table 7.  SSE of estimating MI1

      Figure 11.  Estimation of MI1 by DBN and neural network

      Fig. 12 compares the estimations of MI2 by DBN and conventional feedforward neural network on the unseen validation data. In Fig. 12, the solid, dashed, and dotted lines represent, respectively, the actual values of MI2, the estimations by DBN, and the estimations by the conventional feedforward neural network. From Fig. 12, it can be seen that both models give similar performance when MI values are high. However, when MI values are low, the DBN model gives better estimations. Table 8 shows the SSE values in the estimation of MI2. The SSE of DBN on training data is larger than that of neural network. The SSE values of DBN on testing and unseen validation data set are much smaller than those of the neural network model. The results in Fig. 12 and Table 8 indicate that the estimations of MI2 achieved by DBN are more reliable and accurate than those from the conventional feedforward neural network.

      ModelsSSE (training)SSE (testing)SSE (validation)
      Neural network1.503 50.656 40.991 5
      DBN1.517 00.434 20.856 0

      Table 8.  SSE of estimating MI2

      Figure 12.  Estimation of MI2 by DBN and neural network

    • DBN models for the on-line inferential estimation of the polymer melt index in an industrial polymerization process are developed in this paper. DBN can be developed with a deep structure. The profuse latent information from the process variables can be extracted by DBN. The “unlabeled” process data, which ware useless to the conventional neural network models, can be used in the unsupervised training stage of DBN. It is shown in this paper that the accuracy of inferential estimation of polymer MI can be improved by this means. Selection of DBN structure is investigated in the paper. The appropriate structures of DBN for the estimation of MI1 and MI2 are selected. DBN has much better performance compared with the results from conventional feedforward neural networks. The study demonstrates that DBN is very suitable for developing nonlinear data-driven models for the inferential estimation of polymer melt index. The proposed DBN model could be extended for developing multi-step ahead prediction models in the future. The network structure of DBN can be further optimized to improve the robustness.

    • The work was supported by National Natural Science Foundation of China (No. 61673236) and the European Union (No. PIRSES-GA-2013-612230).

    • This article is licensed under a Creative Commons Attribution4.0 International License, which permits use,sharing, adaptation, distribution and reproduction in anymedium or format, as long as you give appropriate creditto the original author(s) and the source, provide a link tothe Creative Commons licence, and indicate if changeswere made.

      The images or other third party material in this articleare included in the article’s Creative Commons licence,unless indicated otherwise in a credit line to thematerial. If material is not included in the article’s CreativeCommons licence and your intended use is not permittedby statutory regulation or exceeds the permitteduse, you will need to obtain permission directly from thecopyright holder.

      To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reference (24)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return