Rate of Penetration Prediction Utilizing Hydromechanical Specific Energy

The prediction and the optimization of the rate of penetration (ROP), an important measure of drilling performance, have increasingly generated great interest. Several empirical techniques have been explored in the literature for the prediction and the optimization of ROP. In this study, four commonly used artificial intelligence (AI) algorithms are explored for the prediction of ROP based on the hydromechanical specific energy (HMSE) ROP model parameters. The AIs explored are the artificial neural network (ANN), extreme learning machine (ELM), support vector regression (SVR), and least-square support vector regression (LS-SVR). All the algorithms provided results with accuracy within acceptable range. The utilization of HMSE in selecting drilling variables for the prediction models provided an improved and consistent methodology of predicting ROP with drilling efficiency optimization objectives. This is valuable from an operational point of view, because it provides a reference point for measuring drilling efficiency and performance of the drilling process in terms of energy input and corresponding output in terms of ROP. The real-time drilling data utilized are must-haves, easily acquired, accessible, and controllable during drilling operations.


Introduction
The speed at which a drill bit breaks the rock under it to deepen the hole is called rate of penetration (ROP). The ROP prediction is necessary for effective drilling and cost optimization; therefore, it has been of great concern to drilling engineers during the last decades [1,2]. Maximization of ROP is often directly related to the minimization of drilling costs and, therefore, it is a significant measure of drilling performance. Hydrocarbon accumulations are becoming more increasingly difficult to find and reach in terms of depth and remoteness of location, and therefore more complex wells are being drilled. Effective prediction of ROP becomes imperative in order to improve efficiency of the drilling process, enables drilling engineers, and operations team to properly estimates the time for the drilling phase of operations, the associated costs, and properly phase the operation in order to save cost. ROP prediction also helps to explain the reason behind a sudden slowness in the drilling process, and therefore helps in making informed decisions on the optimization strategy to adopt.
There are several techniques present to predict ROP, each with its own merits and demerits, and there is no acceptable universal model for all conditions, as the nature of the relationships among the parameters that affects ROP is quite complex and unique for each case. Traditional ROP model usually predicts ROP with lots of assumptions and wide range of uncertainties due to the complexity in the interactions of several parameters which affects ROP. ROP follows a complex relationship with several drilling parameters such as string rotation (RPM), weight on bit (WOB), mud weight (MW), flow rate, bit hydraulics, formation properties such as compressive strength, pore pressure gradient; mud properties, mud hydraulics, borehole deviation, size, and type of bit used. In some cases, increasing WOB and RPM could results in decreasing ROP, as there is an interaction of these inputs with other factors that affects ROP. The understating of the underlying complex relationships among these parameters is important in the accurate prediction and optimization of ROP [3].
Predictive data-driven (PDA) modeling involves searching through complex data to identify patterns and adjust the program actions accordingly. During drilling operations, lots of realtime data are being gathered with quite a number related to ROP but are riddled with lots of uncertainties and complex relationships which are better handled by data-driven analytical techniques. The ability of AI techniques, to work through complex data sets and establish a relationship or trend without prior assumptions has made it endearing to the hearts of engineers who seek to solve complex drilling engineering problems, especially when the geology and rock mechanic parameters differs from well to well, and therefore may have different recommended drilling parameters within a wide range [4].
Several researches have been carried out in predicting and optimizing ROP using AI techniques. Jahanbakhshi developed an artificial neural network (ANN) modeling for predicting ROP as a real-time analytical approach with encouraging results [5]. Bodaghi et al. showed that optimized SVR has better accuracy and robustness in the prediction of ROP compared to back propagation neural network (BPNN), and is a practicable method to implement for drilling optimization [6]. Also, Shi et al. in their study showed a promising prospect for extreme learning machine (ELM) and upper-layer-solution-aware, in predicting ROP, as they outperform the ANN model [7]. The study of Moraveji and Naderi concluded that response surface methodology, RSM statistical model provides an efficient tool for prediction of ROP as a function of controllable and uncontrollable variables with a reasonable accuracy [8]. Mantha and Samuel, using ANN, SVR, and classification regression trees (CART) in their study, shows ROP follows a complex relationship which cannot be comprehensively explained by traditional models alone. Application of datadriven analytics using several machine learning algorithms coupled with regression analysis can help in better understanding and predicting ROP [3].
This study seeks to improve ROP prediction by proposing the utilization of HMSE parameters as inputs in the prediction of ROP by four AI techniques. The capability of the four AI techniques namely artificial neural network (ANN), extreme learning machine (ELM), support vector regression (SVR), and least-square support vector regression (LS-SVR) are compared. To demonstrate this, a case study is presented using real data from two development wells from onshore Niger Delta hydrocarbon province. The results shows all the AI techniques predicted ROP within acceptable accuracy range and provided an improved and consistent methodology of predicting ROP with drilling efficiency optimization objectives.

ROP models
ROP is an important drilling parameter as a measure of performance in terms of both drilling cost savings and drilling efficiency. It is defined as the slope of the depth evaluated over a short time. It gives a perspective of how fast or slow a particular formation is being drilled or how operational conditions affect the functioning of the drilling system. The mathematical expression of ROP is given as [9]: Factors affecting ROP can be divided into the following [5,10]; • Personnel/Rig efficiency: this refers to the man-power and efficiency of the hardware involved in drilling operation. The experience of the personnel matters and is often a determinant in the selection of certain drilling parameters which affects ROP. The age, ratings, and technology of the drilling rig and associated hardware system also affects the efficiency of the selected drilling parameters to deliver optimum ROP output.
• Characteristics of the formation such as strength, hardness/abrasiveness, formations stress, elasticity, plasticity, pore pressure, balling tendency, porosity and permeability, etc. These parameters that controls ROP with varying degrees of uncertainties in the subsurface. The elasticity and ultimate strength of the formation are the most important parameters that affect ROP. In elastic environments, the normal compaction trend (NCT) indicates the increase in formation strength with increasing depth of burial. This relationship does not hold in carbonate environments. The chemical composition of the formation also affects ROP, with formation containing abrasive minerals rapidly dulling the bit while formation with gummy clay minerals clings to the bit to ball up. All these are uncontrollable factors that affect ROP [9].
• Mechanical factors such as RPM, bit type, and WOB can be often referred to as the bit operating conditions.
Bit type selection is dependent on the type of formation to be drilled with a significant effect on ROP. Some bits such as roller cone bits with large cone offset angle and long teeth are only practical for soft formations due to fast tooth wear and hence a quick loss of ROP in harder formation. The fixed cutter bit is one where there are no moving parts, but drilling occurs due to shearing, scraping, or abrasion of the rock. Fixed cutter bits can be either polycrystalline diamond compact (PDC) or grit hot-pressed inserts (GHI) or natural diamond. They can also be matrix-body or steel-body, the selection of which depends on the application and the environment of use. Matrix is desirable as a bit material, because its hardness is resistant to abrasion and erosion. It is capable of withstanding relatively high compressive loads, but, compared with steel, has low resistance to impact loading. PDC bits are generally used for drilling soft but firm, and medium-hard, nonabrasive formations that are not sticky. The choice of bit therefore has a significant impact on ROP [9].
RPM: this is the revolutions per minute which represents the rotational speed of the drill string. The top drive system (TDS) is a revolutionary introduction into the rig system in the early 1980s, it provides clockwise torque to the drill string to drill a borehole. Figure 1 shows an experimental result which proves that ROP usually increased linearly with increasing values of RPM up to a certain point for a particular formation illustrated as segment a-b, provided all other drilling parameters are kept constant, after which ROP starts to diminish as seen in segment b-c. Point b, is called "the bit floundering point." Weight on bit (WOB): the WOB represents the amount of axial force applied onto the bit which is then transferred to the formation causing it to break. The significance of WOB as a factor affecting ROP can be seen as illustrated in Figure 2. The figure shows zero ROP until the inertial breaking WOB is applied to the formation at point a. The ROP increases rapidly with increasing WOB as observed in segment a-b; then, a linear increase in ROP is observed in Figure 1. Typical response of ROP to RPM. segment b-c followed by only a slight increase in ROP at a high value WOB in segment c-d. In extreme cases, a further increase in WOB will lead to a decrease in ROP as seen in segment d-e. The point at which this occurs is called floundering point.
• Hydraulic factors: this refers to the bit hydraulics, and the two main hydraulic factors with significant effects on ROP are (i) jet velocity, and (ii) bottom hole cleaning. Significant improvement in ROP could be achieved if proper nozzles were selected for a proper jetting action at the bit as drilling fluids flows at a determined flowrate through the drill string and the bit nozzles into the annulus. This promotes better cleaning action at the bit face as well as bottom hole.
Bottom hole cleaning is an important mechanism of removing drilled cuttings from the face of the bit. The jetting action of the mud passing through the bit nozzles has to provide enough velocity and cross flow across the surface of the bit to remove the newly drilled cuttings effectively as the bit penetrates the formation. This will prevent bit balling and regrinding of drilled cuttings by moving them up the annulus to maximize drilling efficiency of the bit.
• Drilling fluid properties: the two main mud properties with significant impact on hole cleaning are the mud density and viscosity.
Mud density: aside serving as the primary control of the well, that is, prevention of formation-fluid intrusion into the wellbore, the mud density functions as mechanical stabilization of the wellbore. Increasing the mud density beyond required to serve the aforementioned functions, is detrimental to ROP, and may cause induced losses by fracturing the formation under the in-situ stress condition. An increase in the mud density causes a decrease in ROP. This is because it causes an increase in bottom hole pressure beneath the bit causing a chip hold-down effect. Hence, regrinding of drilled cuttings with adverse effect on penetration rate.
Viscosity tends to decrease ROP as it increases in drilling fluids. Plastic viscosity is the resistance of the drilling fluid to flow caused by mechanical friction within the fluid. With high viscosity, cuttings tend to remain stuck on the bottom of the hole causing their re-drilling and this leads to reduction in the performance of the bit. It affects the hydraulic energy available at the bit nozzles for cleaning due to parasitic frictional losses in the drill string [9].

ROP empirical models
There has been many proposed empirical ROP models in the last 3 decades; however, three of them are quite popular for estimating ROP, they are (i) Maurer's ROP model, (ii) Galle and Woods ROP model, and (iii) Bourgoyne-Young ROP model.

Maurer's model
Maurer [11] developed a ROP model based on a theoretical penetration equation as a function of WOB, RPM, bit size, and rock strength derived for a roller-cone type bit. A mathematical relation between rate of drilling, WOB, and RPM based on perfect hole cleaning condition was achieved as a function of depth. The ROP equation was thus given as: Here, F D = footage drilled by bit (ft), t = time (h), V = Volume of rock removed, d b = diameter of bit.

Galle and woods' model
Galle and Woods, in their work, investigated the effects of bit cutting structure dullness, WOB, and RPM on ROP, rate of tooth wear and bearing life for roller cone bits. The result of their work is a presentation of graphs and procedures for field applications to determine the best combination of constant WOB and RPM [12]. They presented a drilling rate equation as follows: Here, C fd = formation drillability parameter, a = 0.028125h 2 + 6.0 h + 1 time, hr, h = bit tooth dullness, fractional tooth height worn away, in, p = 0.5 (for self-sharpening or chipping type bit tooth wear), k = 1.0 (for most formations except very soft formations), 0.6 (for very soft formations), r = RPM function, W= function of WOB and d b , such that W ¼ 7:88WOB d b .

Bourgoyne and Young ROP model
The most popular of the ROP model is Bourgoyne and Young ROP model used to calculate the ROP. In their work, they presented a mathematical relationship using a complex drilling model to capture the effects of changes in the various drilling parameters. They proposed an eight function empirical relationship to model the effect of most of drilling variables [1]. The equation form is Here, a 1 = formation strength parameter, a 2 = exponent of the normal compaction trend, a 3 = under compaction exponent, a 4 = pressure differential exponent, a 5 = bit weight exponent, a 6 = rotary speed exponent, a 7 = tooth wear exponent, and a 8 = hydraulic exponent.

Hydromechanical specific energy ROP model (HMSE)
Approaching the drilling process as a closed system in terms of energy input in the form of applied drilling parameters, and a corresponding output, in the form of ROP, brought about the concept of specific energy (SE). This concept was first introduced by Teale in [13]. Further work has been done to fully capture the mechanical and hydraulic energy input and their relationship with ROP. The HMSE concept states that "the energy required to remove a unit volume of rock comes primarily from the torque applied on the bit, the weight on bit (WOB), and the hydraulic force exerted by the drilling fluid on the formation" [14]. Specific energy is therefore a significant measure of drilling performance, especially of the cutting efficiency of bits and rock hardness [15]. The equation form is: Here, HMSE = hydromechanical specific energy in psi, F = WOB in lbs, N = RPM, T = TORQ in lb-ft, A b = bit cross sectional area in in 2 , ROP = rate of penetration in ft/hr, Q = mud flow-in rate in gallons per minute, η = dimensionless energy reduction factor depending on bit diameter, and Δp b = pressure loss at bit in psi.
The use of HMSE-derived ROP model drilling parameters have been proposed in this study because it fully captures the relevant controllable parameters that affects ROP. Also, from an operational point of view, it is valuable because it provides a reference point for measuring drilling efficiency and performance of the drilling process in terms of measuring energy input and corresponding output in terms of ROP. The SE concept became a key element for the fast drill process (FDP) [16]; the process of drilling with the highest possible ROP in terms of technical and economical limits. In early 2004, Exxon Mobil Corporation used the process to optimized their drilling operation with a result of an astonishing increase in ROP by 133% proven the concept a useful one [16,17].

Artificial intelligence (AI) techniques
Artificial intelligence (AI) can be described as the imitation of human intelligence processes by machines, especially computer systems. These processes include the acquisition of information from sets of data, use logic of their interdependency to reach approximate or definite conclusions while self-correcting [18]. AI was coined by John McCarthy, an American computer scientist, in 1956 at The Dartmouth Conference where the discipline was born [19]. According to artificial intelligence applications institute (AIAI), AI areas of application are; case-based reasoning: a technique for utilizing historical datasets to guide diagnosis and fault finding; evolutionary algorithms: an adaptive search technique with very broad applicability in scheduling, optimization, and model adaptation; planning and workflow: modeling, task setting, planning, execution, coordination, and presentation of activity-related information; intelligent systems: an approach of building knowledge-based systems; and knowledge management: the identification of knowledge assets in an organization, and support for knowledge-based work [20].
Some of the advantages of AI techniques include, but not limited to ability to model complex, nonlinear processes without priori relationship assumption between input and output variables; potential to generate accurate analysis and results from large historical databases; ability to analyze large datasets to recognize patterns and characteristics in situations where rules are unknown or relationship and dependency of variables are complex; cost-effectiveness: many AI algorithms have the advantage of execution speed, once they have been trained. The ability to train the system with data sets, instead of writing programs, makes it more cost-effective and changes can be easily implemented when need arises. Multiple algorithms can be combined taking competitive advantages of each algorithm to develop an ensemble AI tools. AI techniques can be deployed to solve routine boring tasks which would be completed faster with minimal errors and defects than human [21].
AI techniques limitations includes some of them being tagged as "black boxes," which merely attempt to chart a relationship between input and output variables based on a training data set. This raises some concerns regarding the ability of the tool to generalize to situations that were not well represented in the data set. However, application of the right domain knowledge helps to address this limitation. Other limitations are the lack of human touch, enormous processing time for large datasets and requirement for high computational resources and skills.
Despite some of the disadvantages of AI techniques, their overwhelming advantages have made them endearing in different fields, including the exploration and exploitation of oil and gas. Recent advancement in the collection and transmission of real-time drilling data coupled with insufficiency of empirical ROP models to unveil the real-time downhole conditions has made researchers to shift into AI techniques for prediction purpose. Furthermore, the effects of all factors affecting ROP and downhole conditions are inherent in the collected surface drilling data. Applying data-driven predictive analysis has proven useful in decoding the hidden information in these drilling data. Table 1 shows some recent work done using artificial intelligence to predict ROP. ANN has been the most often used. What is also clear in the literature review is that the selection of input is not consistent and some may be difficult to obtain in some instances. Also, for optimization purpose while drilling, some of the variables included in the models are not controllable factors that can be adjusted in real time.

Some artificial intelligence techniques
Below are of some of the AI techniques considered in this study. A summary of their characteristics is presented in Table 2.

Artificial neural network (ANN)
Artificial neural networks, ANN, are designed based on the examination of biological central nervous systems and neurons, axons, dendrites, and synapses. Similarly, an ANN is composed of elements that are called "neurons," "units," or "processing elements" (PEs). Each PE has a specification of input/output (I/O) and they are connected together to form a network of nodes for mimicking the biological neural networks, hence they are called "artificial neural network," ANN. The use of ANN as a reliable universal estimator in constructing nonlinear models from data is very common. It is capable of approximating both linear and nonlinear functions defined over a range of data to the desired degree of accuracy using an appropriate number of hidden neurons, this has been proven mathematically [27]. Being data-driven models, they learn from training data presented to them and do not require any a priori assumptions about the problem, not even information about statistical distributions. In petroleum engineering, the training data may be assembled from experimental data, past field data, numerical reservoir simulation, real-time data, or a combination of these [5]. Though assumptions are not required, knowledge of the statistical distribution of the input data and domain knowledge of the problem can help to speed up training. Several issues such as the ability to run parallel processes and apply learning instead of programming have made ANN an efficient tool to be applied in various fields of engineering [28]. In the training process, weights and biases of the network are adjusted on basis of learning rules and completing training; these fixed weights and biases act as the memory of the network.
Some of the advantages of ANN are; ability to handle linear and nonlinear models: complex linear and nonlinear relationships can be derived using neural networks. Flexible input/output: neural networks can operate using one or more descriptors and/or response variables. They can also be used with categorical and continuous data. Noise: neural networks are less sensitive to noise than statistical regression models. While some of the major limitations are; Black box models: it is not possible to explain how the results were calculated in any meaningful way. Optimizing parameters: there are many parameters to be set in a neural network and optimizing the network can be challenging, especially to avoid overtraining [23,27,[29][30][31][32].

Extreme learning machine (ELM)
Extreme learning machines (ELM) are derived from ANN, it is however a generally unified single layer feed-forward network framework with less requirement of human interventions and thus has been found to run faster than most conventional neuron-based techniques. This is notably due to the fact that the learning parameters of its hidden nodes, including input weights and biases, are assigned randomly without any dependency, and the simple generalized operation that is involved in the determination of the output weights. The training phase with data in the ELM algorithm is efficiently completed using a fixed nonlinear transformation which is a fast learning process. The efficiency of ELM in online or real-time applications cannot be over emphasized as it automatically determines all the network parameters analytically and therefore avoids unnecessary human intervention [33].
Also, the universal approximation ability of the standard ELM with additive or Radial Basis Function (RBF) activation function has been proved [7,33]. Success story of the application of ELM in many real-world problems is well documented especially in classification and regression problems on very large scale datasets. ELM is very efficient and effective as an innovative training algorithm for single-hidden layer feed-forward neural networks (SLFNs) [33].
Some of the merits and limitations of ELM can be summarized as follows: ELM reduces the computation burden without sacrificing the generalization capability in the expectation sense. ELM needs much less training time compared to popular ANN and SVM/SVR. The prediction accuracy of ELM is usually slightly better than ANN and close to SVM/SVR in many applications. Compared with ANN and SVR, ELM can be implemented easily since there is no parameter to be tuned except an insensitive parameter L. It should be noted that many nonlinear activation functions can be used in ELM [33]. While the limitations are ELM suffered from both the uncertainty and generalization degradation problem and for the widely used Gaussian-type activation function, ELM degraded the generalization capability [34].

Support vector regression (SVR)
Support vector regressions (SVRs) methodology involves a group of related supervised learning methods employed for both regression and classification problems. They fall in the category of generalized linear classifiers (GLCs). In SVRs, a maximal hyperplane is constructed to separate a high dimensional space of input vectors mapped with the feature space. It was initially designed as a classifier only to be modified in a later study by Vapnik [35] as a support vector regressor (SVR) for regression problems. Its robustness in a single model estimation condition has been testified to [36]. Hence, it can be considered invaluable for the estimation of both real valued and indicator functions as common in pattern recognition and regression problems, respectively.
When used as a regressor, SVRs attempt to choose the "best" model from a list of possible models (i.e., approximating functions) f x; ω ð Þ, where a set of generalized parameters is given by ω. Generally, "good" models are those that can generalize their good predictive performance on an out-of-sample test set. This is often determined by how well the model minimizes the cost function while training with the training data. The core feature of SVR regression in control of its attractive properties is the notion of an ε-insensitive loss function. SVR is suitable for estimating the dominant model under multiple model formulation, where the objective function can be viewed as a primal problem, and its dual form can be obtained by constructing Lagrange function and introducing a set of (dual) variables.
SVRs generalization characteristics are ensured by the special properties of the optimal hyperplane that maximizes the distance to training examples in a high dimensional feature space. It has been shown to exhibit excellent performance [32]. The merits and limitations of SVRs are summarized thus; merits: SVRs can deal with very high dimensional data; they can learn very elaborate concepts; usually works very well. While the limitations are: requirement of both positive and negative examples; the need to select a good kernel function; consumes lots of memory and CPU time; there are some numerical stability problems in solving the constrained [30,37,38]. Analysis of (linear) SVR indicates that the regression model depends mainly on support vectors on the border of ε-insensitive zone; SVR solution is very robust to "outliers" (i.e., data samples outside ε-insensitive zone). These properties make SVM very attractive for its use in an iterative procedure for multiple model estimation.

Least square support vector regressions (LS-SVR)
LS-SVRs are reformulated versions of the original SVRs algorithm for classification and function estimation, which maintains the advantages and the attributes of the original SVRs theory. LS-SVRs are closely related to regularization networks and Gaussian processes but additionally emphasize and exploit primal-dual interpretations [39]. LS-SVR possesses excellent generalization performances and is associated with low computational costs. LS-SVR requires less effort in model training in comparison to the original SVR, owing to its simplified algorithm. It minimizes a quadratic penalty on the slack variables which allows the quadratic programming problem to be reduced to a set of matrix inversion operations in the dual space, which takes less time compared to solving the SVR quadratic problem [40]. Robustness, sparseness, and weightings can be incorporated into LS-SVRs where needed and a Bayesian framework with three levels of inference has also been developed [41]. Some of its limitations include being ineffective at handling non-Gaussian noise as well as being sensitive to outliers [42].

Case study
A case study is presented below to illustrate one of the advantages inherent in combining AI techniques with domain expert knowledge for improved prediction and optimization of drilling rate of penetration.

Data description
In this study, data from two development wells from onshore Niger Delta hydrocarbon province were used for the development and testing of the models, in each of the AI algorithms compared. The field is about 95 square kilometers in extent with a northwest-southeast trending dual culmination rollover anticline. The wells chosen represents the best in terms of drilling performance as measured by best ROP and bit runs for all the three hole sections considered. The formations encountered are mainly consolidated intercalation of shales and shallow marine shoreface sands with a normal compaction trend, a typical elastic depositional environment of the Niger Delta. The field is a mainly gas field with some of the reservoirs having significant oil rims.
The wells used for the study were selected for ROP prediction because they were the best in class in terms of drilling performance, a result of carefully optimized drilling parameters and practices. The repeatability of such feat is highly desirable, and hence the choice of the wells. The formations encountered are well correlated across the field with lateral continuity. These two wells fairly represents the field with Well-A located in the Eastern flank of the field while Well-B is located 8 km to the west of Well-A and just about 3 km to the field western boundary. While Well-A is highly deviated and deeper in reach with maximum inclination of 74 at total depth of 11,701 ft TVD, Well-B is slightly deviated with maximum inclination of 23 at total depth of 9000 ft TVD The wells are also similar in terms of drilling equipment, the same rig was used for their construction; bit type and bottom hole assembly (BHA) used were same, hence, they were both drilled with the same bottom hole hydraulics. Details of the bit used in the three hole sections included in this research are presented in Table 3 As explained in Section 2.4, the specific energy concept in the drillability of a formation is being explored in this study with particular focus on hydromechanical specific energy, HMSE. The HMSE concept states that "the energy required to remove a unit volume of rock comes primarily from the torque applied on the bit, the weight on bit (WOB), and the hydraulic force exerted by the drilling fluid on the formation" [14]. Drilling data from surface data logging (SDL) tools were used in this study. These were real-time data collected at surface and could be transmitted via satellite to a central location while drilling. Among the numerous data usually collected are; measured depth (MD), hookload (HKLD), weight on bit (WOB), pipe rotation per minute (RPM), rotary torque (TORQ), mud flow-in rate (GPM), total gas (TG), pump strokes per minute (SPM), pits volume change, mud flow-out rate percentage (FFOP%), mud weight in (MW), etc. Since ROP prediction using the hydromechanical specific energy ROP model is the focus of the research, efforts to use as many data that affects ROP were consciously made. Given the HMSE Eqs. (6) and (7) in Section 2.4, [14]. It is necessary therefore, to reorganize the collected data and focus on those with physical relationship with ROP based on the HMSE-ROP model.
It is important to mention that the surface drilling mechanics data are inexpensive to collect during drilling operations; the sensors can be calibrated without disturbing drilling operations and are a must-have for drilling operations. Hence, continuous drilling data such as MD, WOB, RPM, flow rate, mud weight, bit size, TORQ, SPP from the two wells were used in this study. Data quality checks were performed on individual wells and simple activity logic was applied to ensure only on-bottom drilling data were used. Noise, as a result of sensor issues, and spurious data points within the dataset were filtered out of the collection first using activity code to sort the data and manually removing data points that are out of range using excel spreadsheet.

Details of the experiment/methodology
The following approach was used in the preparation of the model using data from the selected well as follows: 1. Collect and explore the datasets: raw data from the two wells, which included several drilling equipment parameters, were explored to analyze properties of interesting attributes as it relates to the objective of the study. Eight measured drilling parameters of interest were eventually selected for this study.
2. Data integrity check: verify the data quality and identify plausibility of values from operational point of view.
3. Sorting of data: using drilling activity code to separate on-bottom parameters of the identified predictors (drilling parameters to be used for ROP prediction in the AI models) from HMSE-ROP model. Clean datasets by removing noise either as a result of sensor calibration issues or as equipment malfunctioning using operational background knowledge. The total number of drilling variables which were used as predictors of ROP is presented in Table 4.
Statistical properties of the data in various forms such as standard deviation, mean, median, etc., were taken before training the learning models. Statistical analysis helped to reveal certain characteristics of the datasets, one of such important characteristics is standard deviation as can be seen in Tables 5 and 6.
It reveals that the dataset varies widely as a result of the different lithological units penetrated, and as such data normalization was carried out as part of preprocessing. This brought the various data within same range to align their distributions and prevented biasing of the model toward large values that are present in the dataset [6].
Data splitting and model development: to ensure uniform distribution of the data point and removed effect of biased sampling, the normalized data were then randomized before used in the model development. Data from the two wells were randomly split into 70% for training, 15% for testing and 15% for validation with which the algorithms were trained, modified to come up with an acceptable model for testing in each of the artificial intelligence techniques.

Well-Code No of data Utilized drilling parameters (Predictors)
Well-A (Dataset 1)  3641  WOB, RPM, TORQ, SPP, GPM  Data integrity and similarity were also preserved in all methods to avoid bias in evaluating different algorithms across the four AI techniques.
Model development: the implementation of ANN was carried out using MatLab® ANN toolbox. The implementation was based on the backpropagation algorithm with momentum and adaptive learning rate, and the sigmoidal functions. In the implementation of ELM, the algorithm was based on MatLab® regularized ELM codes found in ELM algorithms [43]. The SVR and LS-SVR model was implemented using the least-square-SVM (LS-SVM) proposed by Valyon and Horvath [44] combined with other functions found in the LS-SVMlab1.8 code [45]. The code was slightly modified to include heavy tailed RBF (htrbf) kernel proposed in Chapelle et al. [46].
Train models and cross validate to select best model: in the training of ANN model, weights and biases of the networks were updated by Levenberg-Marquart (LM) algorithm while the number of hidden layers and neurons was randomly investigated from 1 to 5 and 10 to 100, respectively, in a loop. The algorithm was run for 500 times, and the best models that gave the least RMSE values in the cross-validation results were selected. Similar procedure was used in the training of the ELM models except that number of neuron range from 50 to 5000. In the training of SVR and LS-LSVR models, the algorithms hyper-parameters (e-tube (epsilon), tunning parameter (C), lambda and kernel for SVMR and tunning parameter (gam) and kernal for LS-SVMR) were optimized using cross-validation technique. For each run, a kernel function was chosen and investigated for different range of values of other parameters in a loop. The Kernel function and other corresponding hyper-parameters with the least RMSE values during cross-validation of each run were identified as the best model. Table 7 shows the final selected model hyper-parameters.
Testing and evaluation of models: the models were tested using the testing data and the three set evaluation criteria: cc, RMSE and testing time were recorded for evaluation models.  Table 7. Summary of optimized parameters used in the implementation of models.
The flowchart presented in Figure 3 summarizes the processes.
Data from each well were randomly split into 70% for training, 15% for testing, and 15% for validation with which the algorithms were trained, modified to come up with an acceptable model for testing in each of the artificial intelligence techniques.
To ensure uniform distribution of the data point and removed effect of biased sampling, the normalized data were then randomized before use in the model development. To avoid bias in evaluating different algorithms across the four AI being compared, data integrity and similarity were preserved in all methods. Three performance measures: root mean square error (RMSE), correlation coefficient (cc), and testing time were used to assess the performance of the algorithms.

Performance assessment criteria
To establish a valid evaluation of the performance of the different AI being compared, the assessment criteria used in petroleum journals were considered as the criteria for measuring performance [27,32]. The criteria are as follows.

Correlation coefficient (CC)
This is a measure of the strength of relationship between the predicted value and the actual value being predicted. It indicates how far the model prediction deviates from the real value with high values indicating good performance and vice versa.
cc ¼ P y a À y 0 a À Á y p À y 0 p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P y a À y 0 a À Á 2 y p À y 0 p 2 r (8)

Root mean-squared error (RMSE)
This can be interpreted as the standard deviation of the variance of the predicted value from the corresponding observed value. It is a measure of absolute fit and indicates how close the predicted values are from the actual observed values.
rmse ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi The strategy followed is to implement the four techniques under the same data and processing conditions as described above to avoid bias in evaluating different algorithms [29,30,47]. Also, the design of the individual models utilized the cross-validation technique to select the optimal tuning hyper-parameters with the validation data set using the RMSE evaluation criteria to measure their performance. Runs for each of the techniques were repeated several times using a loop, in order to optimize the hyper-parameter of the models while using crossvalidation to select the best model for the algorithms. The testing data is run on the model and cc, RMSE and testing time were recorded to evaluate the model for comparison.

Experimental results and discussion
In the implementation of each of the techniques tested for ROP prediction, the training, validation, and testing data described above were used.
• Dataset A which comprises of eight HMSE-ROP related drilling parameters from Well-A.
• Dataset B which comprises of eight HMSE-ROP-related drilling parameters from Well-B.
The datasets are presented in Table 8. Tables 9 and 10 show the results of the four AI algorithms used for ROP prediction in the study. After several runs, the best model in each were tested and evaluated to be adjudged the best. The algorithms were independently tested with eight drilling parameters presented in Table 8.

Discussion of results
Each of the four AI techniques tested exhibited its competitive performance as shown in the results. Figures 4-6 show the performance of the four techniques in each of the dataset both during the training and testing, and therefore revealed their respective comparative strong and weak points. The comparative results of the four AIs applied to the two datasets using the same drilling parameters were plotted and are as shown in Figure 4.
RMSE and CC as earlier defined are measures of performance in terms of accuracy, with the algorithm exhibiting lowest RMSE and highest CC being the most accurate predicting algorithm. In Figure 4, a cross-plot of the testing correlation coefficient (cc) against the testing root mean square error (RMSE) shows that in Well-A the best performance in terms of accuracy in the algorithms is produced by LS-SVR followed closely by SVR while the least accurate performance is seen in ELM and ANN. The same pattern is repeated in Well-B with LS-SVR exhibiting the best performance and ANN and ELM performance are not remarkably far from each other. The overall best performance is LS-SVR performance in Well-B. This is as a result of the data density in Well-B as seen in Table 3. Therefore, LS-SVR provides an excellent function estimation capability.
By comparing the testing time as seen in Tables 7 and 8, and plotting in Figure 5, it is evident that among the four algorithms tested, LS-SVR and SVR in both wells require considerable amount of time for model testing, while ANN and ELM require the minimum time for the same process. The density and amount of data used for Well-B as can be seen in Table 3, is evidently responsible for the extra time it takes for testing the model.
The application of domain knowledge and in particular, the utilization of specific energy as a concept in selecting the controllable drilling parameters used in the prediction of ROP has proven valuable with all the AI models showing accuracy within acceptable range. A depth plot of actual ROP against the predicted ROP from all the AI models is presented in Figure 6.
As can be observed, the qualitative difference is quite elusive showing that the four AI models are good predictors with reasonable accuracy.
In summary, the LS-SVR produces the best ROP model for the two dataset in term of accuracy, while it requires considerable amount of testing time of the four AI techniques compared. Therefore, it is more suitable for situations where accuracy is most desirable. Whereas, ELM and ANN requires the shortest testing execution time and are less accurate, they are more  suitable for scenarios where the execution time critical. It must however be stated, that the use of drilling domain knowledge in the choice of drilling parameters has enhance the accuracy of all the AI algorithm predicted ROPs to be within acceptable range, while using variables from HMSE-ROP model as input.

Conclusion
AI techniques have increasingly proved to be of immense value in the oil and gas industry where it has been employed by different segments of the industry. Traditional methods has not been able to manage such huge impacts in such a short time as AI methods because of its ability to decipher hidden codes and complex relationships within the enormous data collected daily during drilling operations. However, application of the right domain expert knowledge has shown improved performance in the deployment of AI techniques. This technique and its application leads to time and cost saving, minimized risk, improved efficiency and solutions many optimization problems. The ability of the technique to retrain itself with life data within a shorter time has made it a major founding block for drilling automation.
This paper presents an improved methodology of predicting ROP with real-time drilling optimization in mind. Recent studies in the use of AI in the prediction of ROP shows some inconsistency in the selection of input variables. The parameters used in this study are the must haves and easily accessible parameters which can mostly be adjusted while drilling and are therefore controllable. The utilization of HMSE-ROP model has also enhanced the performance of the models as a result of selecting few variables with established relationship to ROP even though nonlinear. All the methods used provided good degree of accuracy, and therefore presented the engineers with options to use whichever algorithm is suitable for their