Abstract
Precisely forecasting the operational characteristics of oil pipelines is essential for developing rational design, production, and operation strategies, as well as reducing energy consumption and saving energy. Due to significant disparities in the computation outcomes of conventional mechanism models and the inadequate performance of machine learning models when handling limited sample data, their conclusions likewise lack tangible significance. In this study, a novel physics-guided neural network (PGNN) model, which integrates mechanisms with machine learning models, is introduced. The proposed model incorporates essential physical intermediate factors that impact the temperature and pressure of oil pipelines as artificial neurons within the loss function. Additionally, an adaptive moment estimate approach is employed to optimize the parameters of the model. Through a comparative analysis of various models' predictive capabilities on an oil pipeline, it was shown that PGNN has the highest level of accuracy in forecasting pipeline temperature and pressure. Furthermore, PGNN demonstrates the ability to generate satisfactory prediction outcomes even with a limited sample size. Simultaneously, the predictive outcomes of PGNN exhibit a stronger correlation with variables that have a direct impact on temperature and pressure.
1 Introduction
As the main mode of oil transportation, pipeline transportation is the core of the oil transportation industry [1]. In the transportation process of wax-containing crude oil, due to its unique physical properties, it shows poor fluidity in low-temperature environments, and the transportation process generally requires a heating transportation process [2]. This transportation requirement makes the waxed crude oil transportation pipeline need more energy consumption compared with the conventional pipeline, which mainly contains two aspects on the one hand, the pumping unit for the pipeline pressurization consumption of electric energy; on the other hand, is the heating furnace for the heating of the crude oil consumption of thermal energy [3]. Therefore, in order to respond to corporate energy conservation and emission reduction policy requirements and ensure the safe and economical operation of oil pipelines, accurately grasping the changes in pipeline temperature and pressure is the basis for pipeline simulation and optimization operations [4].
In crude oil pipeline transportation systems, for effective management and operation optimization, the traditional parameter prediction mainly includes two major categories: process calculation method and statistical method [5]. The process calculation method establishes relevant mechanism models based on kinetic and thermodynamic analyses [6], calculates relevant parameters according to the pipeline's process, and optimizes the pipeline's operation plan [7]. Commonly used pipeline parameter simulation software using physical models include olga [8], sps [9], and pipesim [10]. Due to the calculation process involving station equipment and pipeline, many parameters cannot be accurately obtained, affecting the accuracy of the predicted parameters. The statistical method is based on the pipeline historical production data on the parameters of the regression prediction do not need to take into account the complexity of the actual process [11]; the construction process is relatively simple, commonly used statistical analysis methods, including linear regression (LR) [12], autoregressive integral moving average model [13], gray prediction model method [14], and so on. The prediction accuracy increases with the increase of data, so the amount of data required is large, and the prediction accuracy is limited [15].
To overcome the limitations of traditional parameter prediction methods, machine learning methods have better advantages in the field of process prediction. For example, Yarveicy et al. [16,17] used different machine learning methods to predict the minimum initial temperature required for natural gas after throttling and the solubility of CO2 in an aqueous Piperazine (PZ) solution, respectively, and both of them achieved good predictions, which are important for practical production. In addition, they applied Extra Trees combined with the least squares support vector machine (LSSVM) algorithm to model the phase equilibrium of gas hydrate and found that Extra Trees is better than LSSVM in prediction accuracy, which is of significant application value for the prediction of gas hydrate phenomenon [18]. In the prediction of pipeline operating parameters, Sun et al. [19] developed a split systems approach-based model to predict pipeline reliability, which effectively helps one to make optimal pipeline maintenance decisions. Su et al. [20] integrated deep learning techniques with controllability theory to forecast the operational parameters of natural gas pipelines, achieving an average prediction accuracy of 0.99. Peng et al. [21] proposed a combination of support vector regression, principal component analysis, and chaotic particle swarm optimization model, which is effective in predicting the corrosion rate of multiphase flow pipelines. Temperature and pressure are some of the most important parameters in pipeline production and operation; machine learning methods are of great significance in realizing their accurate prediction, which helps to ensure the safe operation of pipeline systems. Shadloo et al. [22] utilized a multilayer perceptron neural network model to forecast the pressure drop in horizontal pipelines, yielding superior accuracy when compared to conventional pressure drop computation techniques. Zhang et al. [23] assessed the predictive performance of three neural networks (backpropagation neural network (BPNN), radial basis function neural network (RBFNN), and general regression neural network) for pipeline temperature and pressure and identified the most effective one. The utilization of the RBFNN neural network for forecasting leads to a reduction in the overall cost of energy by up to 10.75%. Li et al. [24] developed a three-dimensional computational model for oil pipelines in cold regions to predict the oil temperature along the pipeline, which effectively reduces the risk of pipeline freezing and plugging. Wei et al. [25] conducted a comparison between the BPNN model and the backpropagation (BP) neural network model improved by the particle swarm method to analyze pipeline pressure drop. They ultimately implemented the PSO-BP model and effectively attained an accurate prediction of pressure drop.
Although the previous process of predicting oil pipeline parameters has achieved good prediction accuracy, most models only focus on the relationship between input and output, resulting in a lack of physical meaning in the calculation process and biased predictions for some specific operating conditions [22–26]. To address this limitation, more and more studies have focused on combining mechanistic models with machine learning to achieve accurate prediction [27]. Zheng et al. [28] proposed a theory-guided short-term and long-term memory model for pipeline shutdown pressure prediction, which not only exhibits good prediction performance but also follows the physical principles and engineering theories. Zhang et al. [29,30] make full use of the advantages of both mechanistic models and machine learning to predict liquid retention and pressure drop in gas–liquid two-phase flow of pipelines and energy consumption of crude oil pipelines, respectively, and effectively improve the prediction accuracy of the model. Yuan et al. [31] combined a mechanistic model based on the Austin–Palfrey equation with machine learning to predict the length of the pipeline mixing section, which not only improved the accuracy of prediction but also increased the ability to recognize outliers. However, there are still relatively few studies combining mechanistic modeling and machine learning in the prediction of oil pipeline temperature and pressure.
Therefore, in order to fully integrate the advantages of mechanistic modeling and machine learning methods, a physically guided neural network (PGNN) model is developed in this paper to predict the temperature drop and pressure drop of oil pipelines. The model adds the physical intermediate variables affecting the temperature and pressure of the oil pipeline as neurons to the loss function and optimizes the model parameters using an adaptive moment estimation algorithm, which has a good prediction performance. It not only makes up for the shortcomings of the mechanistic model but also provides solid theoretical support for the prediction. The paper is organized as follows: Sec. 2 introduces the adopted PGNN model and its background. The data sources, as well as the processing, are described in Sec. 3. Section 4 assesses the accuracy value and usefulness of the model predictions and, finally, explains the results of the model predictions and summarizes the research work in the paper.
2 Research Methods
2.1 Modeling of Temperature and Pressure Loss Mechanisms in Crude Oil Pipelines.
In the formula, G is the mass flowrate of oil, kg/s; ch is the specific heat capacity of the oil, J/(kg ℃); D is the pipe outer diameter, m; L is the length of pipe, m; K is the total heat transfer coefficient of the pipe, W/(m2·℃); tR is the starting temperature of the pipeline, ℃; tL is the temperature at distance L from the starting point, ℃; t0 is the soil temperature around the pipe, ℃; and a is a parameter.
The total heat transfer coefficient K value is a crucial element in the formula above, impacting the temperature drop. Testing is challenging because of the intricate environment of underground pipelines and the heat transmission process, which comprises the three primary stages of hot oil pipeline-soil. This study utilizes real production data and uses the formula inverse method to determine the K value in the oil pipeline temperature drop model.
In the formula, H is the pressure drop in the pipeline, m; hf is the friction loss along the pipeline, m; is the total local friction loss, m; Zm is the elevation of the endpoint, m; and Zc is the elevation of the initial point, m.
In the formula, λ is the hydraulic friction coefficient; D is the pipe diameter, m; L is the length of the pipe, m; V is the flowrate, m/s; and g is the acceleration due to gravity, m/s2.
The friction loss occurring within the local section of the long-distance pipeline, as mentioned in this paper, often accounts for only 1–2% of the overall friction loss. Additionally, the pipeline is situated on a level terrain, so the decrease in elevation can be disregarded. Therefore, the friction loss along the way can be used as the actual value of the pipeline pressure drop, and the hydraulic friction coefficient also adopts the inverse algorithm as the training data for predicting the oil pipeline pressure drop model.
2.2 Mechanism-Based Prediction Model Construction.
In the formula, b is the activation function; is the weight of the ith input, ; is the ith input, ; e is the response sensitivity of neural units to signals; and y is the output of the neural unit.
The traditional artificial neural network model consists of three main parts: the input layer, the hidden layer, and the output layer [34]. Figure 1 displays the structure.
Traditional neural networks learn patterns from data without prior knowledge of physical principles, which can lead to conflicts between the findings of the data-driven model and the physical mechanism [35]. Outliers in the input data might lead to erroneous prediction findings and hinder generalization performance with fresh data. This research forecasts the temperature and pressure of oil pipelines by incorporating the mechanism into a neural network to ensure that the prediction results align with changes in physical consistency. The total heat transfer coefficient and hydraulic friction coefficient were back-calculated using historical production data, temperature drop, and pressure drop mechanism model to provide data for training the neural network model. Choose the mass flowrate, ambient temperature, and pipeline starting point temperature as the input layer of the neural network. Utilize the hidden layer to capture the non-linear relationship between the input layer and the λ and K values. The results are sent to the P–T layer, where the appropriate pipelines and oil parameters are entered. The parameters determine the pressure and temperature at the pipe's outlet. The neural network model structure (PGNN) based on the physical modeling method of pipeline temperature decline and pressure drop is illustrated in Fig. 2.
2.3 Loss Function Optimization.
In the formula, MSE is the mean square error; is an estimate of x; , , , and are the weights of the error term.
2.4 Parameter Optimization Using the Adaptive Moment Estimation Algorithm.
Due to the introduction of four error term weight values in the aforementioned loss function, the adaptive moment estimation algorithm (Adam) method is employed to improve the PGNN model by adjusting parameters. Adam is an optimization algorithm that was introduced by Kingma and Ba [37]. The algorithm combines the concept of moment estimation by calculating the first-order and second-order moments of the gradient to design adaptive learning rates for different parameters. This effectively balances convergence speed and stability and is primarily used for updating neural network parameters [38].
In the formula, is the momentum at time-step t; is the current time-step gradient; and is the momentum attenuation coefficient, usually 0.9.
In the formula, is the exponential moving average of the squared gradient at time-step t; is the attenuation coefficient of the squared gradient, usually 0.999.
In the formula, is the weight of time-step t; is the learning rate, usually 0.001; is a small constant added for numerical stability, usually 10−8.
The error term weights in the loss function of the PGNN model are chosen as the decision variable to minimize the loss function error. The flowchart of the optimization type using Adam's algorithm is shown in Fig. 3.
3 Data Preprocessing
3.1 Data Source.
The research data for this paper come from a long-distance thermal oil pipeline. The data are gathered via authentic on-site operation reports, which mostly consist of operating data such as pipeline pressure, temperature, transportation volume, and ambient temperature. Table 1 displays the specifications of the pipeline and the characteristics of the oil. The actual operating data of the pipeline are shown in Table 2.
Number | Parameter (Unit) | Value |
---|---|---|
1 | Pipeline length (m) | 31,360 |
2 | Pipeline outer diameter (mm) | 630 |
3 | Pipeline wall thickness (mm) | 8 |
4 | Oil density (20 ℃) (kg/m3) | 871.3 |
5 | Oil freezing point (℃) | 35 |
6 | Oil viscosity (mPa·s) | 25.5–931 |
7 | Oil specific heat capacity (J/kg/K) | 1745–1962 |
Number | Parameter (Unit) | Value |
---|---|---|
1 | Pipeline length (m) | 31,360 |
2 | Pipeline outer diameter (mm) | 630 |
3 | Pipeline wall thickness (mm) | 8 |
4 | Oil density (20 ℃) (kg/m3) | 871.3 |
5 | Oil freezing point (℃) | 35 |
6 | Oil viscosity (mPa·s) | 25.5–931 |
7 | Oil specific heat capacity (J/kg/K) | 1745–1962 |
Parameter (Unit) | Minimum | Maximum | Average |
---|---|---|---|
Throughput (t) | 7824 | 29,242 | 18,142 |
Environmental temperature (℃) | −7.6 | 15.36 | 3.9 |
Temperature drop (℃) | 2.7 | 11.5 | 5.2 |
Pressure drop (MPa) | 0.3 | 1.5 | 0.82 |
Parameter (Unit) | Minimum | Maximum | Average |
---|---|---|---|
Throughput (t) | 7824 | 29,242 | 18,142 |
Environmental temperature (℃) | −7.6 | 15.36 | 3.9 |
Temperature drop (℃) | 2.7 | 11.5 | 5.2 |
Pressure drop (MPa) | 0.3 | 1.5 | 0.82 |
3.2 Noisy Data Processing.
When collecting data, due to the many factors involved, data that do not conform to the actual operating conditions may be generated, thereby affecting the training effect of the model. Therefore, in order to improve data quality, wavelet denoising technology is used to smooth and denoise the data collected by oil pipelines.
Typical threshold processing techniques consist of soft thresholding and strong thresholding. Hard thresholding highlights the distinctiveness of data by directly setting wavelet coefficients that are less than the threshold to zero. Contrary to this, soft thresholding decreases wavelet coefficients that are smaller than the threshold by reducing their amplitude, making it more appropriate for preserving certain relatively tiny amplitude features in the data.
Considering that the neural network training process requires a diversity of data, we choose to use a hard threshold method for processing. The processing flowchart of the wavelet denoising method is shown in Fig. 4.
3.3 Data Set Partitioning.
The oil pipeline operation data for three consecutive years (36 months) are selected as the data set, and each piece of data is recorded daily. The training set and the test set are divided at a ratio of 8:2. In view of the fact that different division methods may cause differences in the distribution of sample data and the overall data, the data set was divided using the random sampling method and the stratified sampling method [40], respectively, and the optimal division method was selected to minimize the impact of the sampling error effectively.
Due to the varying throughput of distinct samples, the throughput is used as the sample characteristic for stratified sampling. Figure 5 displays the sample distribution created using stratified sampling and random sampling. The error bar represents the sample's standard deviation.
Stratified sampling improves the model's generalization ability on unseen data by precisely reflecting the properties of the total data, as shown in the figure.
3.4 Data Normalization.
Before training the neural network model, it is customary to normalize the data to guarantee that data with different characteristics are on a consistent scale [41], often by mapping the data set to the value range of [0, 1]. The research utilizes linear normalization to standardize features and improve model accuracy through the use of normalized data. The formula for linear normalization is expressed as follows:
In the formula, is the normalized data; is the maximum value in data set A; and is the minimum value.
4 Results and Analysis
4.1 Model Evaluation Criteria.
In the formula, N is the number of samples; is the predicted value; is the true value; and is the mean.
A reduced RMSE indicates a higher degree of accuracy in the computation result. A higher value of R2 indicates a stronger correlation between the variables, resulting in a more accurate fitting effect. A commonly held belief is that a model with a goodness of fit of more than 0.8 is considered to be quite high.
4.2 Model Prediction Accuracy.
To assess the prediction performance of several models, LR, BPNN, and PGNN are employed to predict the same dataset. The Adam algorithm is utilized to optimize the models' parameters. Figure 6 displays the forecast accuracy of each model, while Tables 3 and 4 show the performance metrics.
Model | RMSE (℃) | R2 |
---|---|---|
PGNN | 0.066 | 0.987 |
BP | 0.184 | 0.841 |
LR | 0.203 | 0.797 |
Model | RMSE (℃) | R2 |
---|---|---|
PGNN | 0.066 | 0.987 |
BP | 0.184 | 0.841 |
LR | 0.203 | 0.797 |
Model | RMSE (MPa) | R2 |
---|---|---|
PGNN | 0.071 | 0.946 |
BP | 0.195 | 0.832 |
LR | 0.227 | 0.780 |
Model | RMSE (MPa) | R2 |
---|---|---|
PGNN | 0.071 | 0.946 |
BP | 0.195 | 0.832 |
LR | 0.227 | 0.780 |
Predictions were tested on datasets of varying sample sizes to thoroughly assess the model's prediction accuracy. Figure 7 displays the average prediction error for each model.
A comparison of the predicted and actual temperature and pressure values for each model in Fig. 6 shows that the values predicted by the PGNN model are closer to the actual test set data. In addition, the PGNN model has the smallest RMSE and the largest R2 in its prediction performance, both of which are better than the other two models.
From Fig. 7, it can be seen that the PGNN model can also achieve good prediction results with small sample data and minimize the average prediction error. Hence, the PGNN model is superior in forecasting the temperature and pressure of oil pipelines.
4.3 Physical Consistency.
In order to reflect whether the predicted value of the model conforms to the physical consistency relationship, the specific calculation results of the absolute value of the Pearson correlation coefficient are shown in Fig. 8. It can be seen that the predicted result of the PGNN model has a Pearson coefficient closer to one, which is more consistent with the physical consistency requirements.
4.4 The Impact of Physical Intermediate Variables.
To assess the impact of physical intermediate variables on the model, various loss functions are employed to forecast temperature and pressure. The detailed outcomes are displayed in Tables 5 and 6. The table demonstrates that incorporating physical intermediate variables into the loss function notably enhances the predictive accuracy of the model.
5 Results and Discussion
The PGNN model is established to predict the temperature and pressure of the oil pipeline, and the input data are selected from the actual production data of a long-distance pipeline. The results are shown in Fig. 5 by comparing the effects of the stratified sampling method and the random sampling method in the delineation of the data set. It can be clearly seen that the mean and standard deviation of the training set and test set of the stratified sampling method are closer to the actual data and better reflect the characteristics of the overall data, which lays the foundation for the subsequent training and validation of the model.
In the examination of the model prediction accuracy, the model evaluation indexes are chosen as the RMSE and the R2. Linear regression, backpropagation neural network, and PGNN models are used to predict the same dataset. The prediction accuracies are shown in Fig. 6, and the performance indexes are shown in Tables 3 and 4. As can be seen from Fig. 6, the values predicted by the PGNN model are closer to the actual test set data, and because of the introduction of physical intermediate variables in the model, the model is able to predict parameters with a wide range of transmission variations more accurately. The data in Tables 3 and 4 show that the PGNN model has the smallest predictive performance index RMSE for temperature and pressure, 0.066 and 0.071, respectively, and the largest R2, 0.987 and 0.946, respectively, which indicates that the model is able to accurately predict the parameters, proving the effectiveness of the PGNN model.
In order to explore the effect of different sample sizes on the prediction performance of the models, the average prediction errors of the three models under different sample sizes were compared. As shown in Fig. 7, the prediction error of the PGNN model is much smaller than that of the BP neural network model and the linear regression model, and good prediction results can be realized under small sample data.
Figure 8 evaluates the relationship between model predictions and physical consistency using the Pearson coefficient, and the Pearson coefficient of the prediction results of the PGNN model is close to one, indicating that the prediction results are highly consistent with the physical consistency requirements.
In order to test the degree of influence of physical intermediate variables on the prediction performance of the PGNN model, the differences in prediction performance metrics of the PGNN model using different loss functions were compared, as shown in Tables 5 and 6. From the tables, it can be seen that the temperature drop and prediction model RMSE performance indexes were reduced by 0.031 and 0.042, respectively, with the addition of physical intermediate variables. Additionally, the R2 performance indexes are improved by 0.176 and 0.143, respectively. These changes indicate that the introduction of physical intermediate variables enhances the prediction performance of the model.
6 Conclusion
This research suggests a PGNN-based model to predict temperature and pressure in oil pipelines located in cold regions. The Adam algorithm is utilized for model parameter optimization. By incorporating physical intermediate variables as neurons in PGNN and including the prediction error of these variables in the loss function, the mechanism becomes part of the prediction model, enhancing the model's interpretability.
Three prediction models, linear regression, backpropagation neural network, and PGNN, are applied to an oil pipeline to compare the predictive effectiveness of the models. The PGNN model exhibited the highest prediction accuracy, enhanced the model's generalization capability, and demonstrated the minimum average prediction error. The PGNN model can produce superior prediction outcomes even with limited sample sizes.
The Pearson correlation coefficient between the PGNN prediction results and variables influencing energy usage is close to one, indicating that the prediction findings align well with the requirements of physical consistency.
Comparing various loss functions on prediction accuracy reveals that incorporating physical intermediate variables into the loss function enhances the model's predictive performance.
Funding Data
• The Natural Science Foundation of Heilongjiang Province (Grant No. LH2022E025).
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.