Parameters and explanations of the GWO algorithm.

## Abstract

In the pile-free bicycle sharing scheme, the parking place and time of the bicycle are arbitrary. The distribution of the pile does not constrain the origin and destination of the journey. The travel demand of the user can be derived from the use of the shared bicycle. The goal of this article is to predict the probability of transition for a shared bicycle user destination based on a deep learning algorithm and a large amount of trajectory data. This study combines eXtreme Gradient Boosting (XGBoost) algorithm, stacked Restricted Boltzmann Machines (RBM), support vector regression (SVR), Differential Evolution (DE) algorithm, and Gray Wolf Optimization (GWO) algorithm. In an experimental case, the destinations of the cycling trips and the probability of traffic flow transfer for shared bikes between traffic zones were predicted by computing 2.46 million trajectory points recorded by shared bikes in Beijing. The hybrid algorithm can improve the accuracy of prediction, analyze the importance of various factors in the prediction of transfer probability, and explain the travel preferences of users in the pile free bicycle-sharing scheme.

### Keywords

- deep learning
- restricted Boltzmann machines
- support vector regression
- eXtreme gradient boosting
- shared bikes data

## 1. Introduction

Bicycle sharing is a new type of transportation with low energy consumption and emissions. It serves short-distance travel and helps solve the “last mile” problem [1]. With the rapid development of the mobile Internet, the pileless bicycle began to replace the pile station bicycle [2]. In the pile-free bicycle sharing scheme, the parking place and time of the bicycle are arbitrary. The distribution of the pile does not constrain the origin and destination of the journey. The travel demand of the user can be derived from the use of the shared bicycle. The distribution of destinations for shared bike users is a valuable study. However, the large number of shared bicycle tracks requires a lot of computation time. This paper sets up different traffic areas and studies the law of shared bicycle flow transfer between the traffic areas. On this basis, we predict the ratio of the traffic flow of shared bicycles between traffic areas. It can be considered as the probability that the shared bicycle user selects the traffic area A as the origin and the traffic area B as the destination.

The pile-free shared bicycle system puts a large number of bicycles in the city. The amount is much higher than the number of traditional piled public bicycles. Therefore, when dealing with massive volume of trajectory data volume as a data set, classical statistical methods and traditional neural network algorithms would have limited processing capabilities.

As a newly developed travel method, the algorithms for the destination prediction of trips based on shared bikes need to be researched in depth [3, 4, 5]. In Deep neural networks (DNN), the model with multi-hidden layers can be developed based on the artificial neural network. The hidden layers of DNN convert the input data into a more abstract compound representation [6, 7, 8, 9, 10].

The Restricted Boltzmann Machine (RBM) is an algorithm that can be used for dimensionality reduction, classification, regression, and feature learning problems. RBM reconstructs data in an unsupervised algorithm and adjusts the weight through the process of reverse transfer and forward transfer. The RBM gradually approaches the original input and learns the probability distribution on the input set [11, 12, 13, 14, 15].

In this paper, a stacked RBM-SVR algorithm is constructed by combining support vector regression (SVR) [16] and stacking RMB algorithm. RBM-SVR is used to predict continuous output values. The error penalty factor

In machine learning algorithms, Mirjalili et al. [20] proposed Gray Wolf Optimizer (GWO) as a meta-heuristic algorithm for solving many multi-modal functions in 2014. In addition, Storn and Price [21] proposed a differential evolution (DE) algorithm. The DE algorithm is an optimization algorithm based on modern intelligent theory. DE intelligently optimizes the direction of search through groups generated by cooperation and competition among individuals. Based on the above theory, an algorithm called the differential evolution Gray Wolf Optimizer algorithm (DEGWO) is used to optimize

After processing by RBM algorithm, the input data are transformed into a sparse matrix containing a high amount of information. It can reduce the computation time of SVR prediction, but it may also increase the number of outliers in the SVR algorithm, increase the complexity of the SVR model and reduce the stability of the fitting process. To solve this problem, this paper proposes a hybrid algorithm that combines the eXtreme Gradient Boosting (XGBoost) algorithm with the stacked RBM-SVR algorithm. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable [22]. XGBoost uses the Exclusive Feature Bundling method to transform several sparse features into a dense matrix [23]. In the process, the approximate greedy algorithm is used to find the best combination of merged features when the number of bundles is the smallest. The XGB algorithm realizes the partition of nodes by the second order gradient and optimizes the greedy algorithm of splitting nodes by the local approximation algorithm [24].

The principal purpose of this paper is to build a hybrid model that combines the XGBoost model, the stacked RBM-SVR network, and the DEGWO optimization algorithm. This paper analyzes the trajectory data of shared bicycles, extracts the cell information, and predicts the probability of user destination selection in the traffic area, that is, predicts the transfer probability of shared bikes.

## 2. Background

Artificial intelligence (AI) is a domain of computer science that studies how to apply computings to simulate the fundamental theories, methods, and techniques of human knowledge. As the mainstream algorithm of artificial intelligence, deep learning is considered capable of solving many challenges in the field of computer vision, prediction, and optimization. It realizes the automatic positioning of targets and automated learning of target features, which improves the speed and accuracy of target detection. Artificial intelligence is mainly used to share bicycles in the following aspects.

First, the user’s travel behavior and the law of spatial movement can be obtained through machine learning algorithms and statistical theory analysis. The user’s travel preferences can be quantitatively analyzed. Researchers can discuss the impact of various influencing factors on shared bicycle usage, such as the mix of land use, the degree of convergence with public transport facilities, the sharing of bicycle infrastructure, rainfall and high temperatures [25, 26].

Second, through the deep learning algorithm of AI technology, the dynamic demand and parking demand of the shared bicycle users can be predicted. The focus of this paper is on this issue. This paper uses a deep learning algorithm to predict the probability of user destination selection. Xu et al. [27] prove that the long short-term memory neural networks (LSTM NNs) in deep learning algorithms are superior to traditional statistical metrology algorithms and advanced machine learning algorithms. LSTM NNs can better predict the riding demand dynamically. Besides, based on the distribution of road networks and travel needs, researchers can predict parking demand and develop better layout strategies for electronic fences [28].

Finally, according to the deep reinforcement learning algorithm in AI technology, a shared bicycle scheduling model can be constructed. Deep reinforcement learning combines the perception of deep learning with the decision-making ability of reinforcement learning. It can be directly controlled according to the original input data. It is an artificial intelligence method that is closer to human thinking. Based on this algorithm, the dynamic scheduling model can efficiently optimize goals such as improving user satisfaction and reducing system cost [29, 30, 31].

## 3. A stacked RBM_SVR deep learning algorithm

RBM_SVR is a deep learning model that connects three stacked RBM models and one SVR model. First, in RBM_SVR, the bottommost RBM is trained with the original input data, and the top RBM takes the feature extracted by the bottom RBM as input and continues training. RBM_SVR repeats this process until the topmost RBM model is trained. Secondly, RBM_SVR fine-tunes the network through the traditional global learning algorithm (BP algorithm), so that the model can converge to the local best. Finally, RBM_SVR can efficiently train a deep network and output the predicted probability value according to the SVR model.

Each RBM model has a visible layer

where

where

where

## 4. An improved RBM-SVR algorithm

### 4.1 Principles of GWO algorithm

Assume that in a D-dimensional search space, the population size

Parameter | Interpretation |
---|---|

The number of current iterations | |

The swing factor | |

The position of the prey after the | |

The position of the gray wolf during the tth iteration | |

A random number within | |

The convergence factor, | |

A random number uniformly distributed within [0,1] | |

The distances between the individual gray wolves, |

The update process of

where

### 4.2 Principles of the DE algorithm

Assume that in the D-dimensional search space, in the population size NP, *g*th generation of the population, *k*th individual in the *g*th generation of the population,

#### 4.2.1 Initialization of the population

Initially, the algorithm randomly generates the 0th generation of the population over the entire search space, and the value of the individual

where

#### 4.2.2 Mutation

Mutant individual is generated via Eq. (9).

where

#### 4.2.3 Crossover

The crossover process in the DE algorithm is expressed as Eq. (10).

where CR is the crossover probability within

#### 4.2.4 Selection

Selection operation compares the vector

Therefore, this mechanism allows the populations of the offspring to evolve based on the current population. This optimization mechanism can improve the average optimization ability of the population and converge the optimal solution.

### 4.3 DEGWO algorithm

In the DEGWO algorithm,

Algorithm 1. RBM_SVR _DEGWO |

Input: Output: Initialize for each individual wolf doend forsort compute top three gray wolf individuals for each generation doupdate for each individual wolf docompute mutant individuals compute children population end forfor each individual wolf doupdate end forend forupdate the parameters in DBN RBM_SVR ( return |

## 5. A hybrid model based on RBM SVR DEGWO and XGBoost

### 5.1 XGBoost principle

#### 5.1.1 Model function form

This paper assumes that the data set as the input sample is

Here,

In Eq. (13),

#### 5.1.2 XGBoost learning objective function

The objective function based on the parameter space is shown in the following Eq. (14).

where

#### 5.1.3 Model optimization

In the model parameter optimization process, each iteration model is always added a new function on the optimal model obtained from the previous training. After the

The objective function can be rewritten to Eq. (17).

In formula (17), the model’s goal is to learn the function of the kth tree. When the error function is replaced by a second-order Taylor expansion, the objective function can be rewritten as Eq. (18).

When

This objective function solves regression, classification, and sorting problems. Eqs. (20) and (21) are in the form of a tree structure of the regression tree function and the regular term. The objective function can be updated to Eq. (22).

This article defines the sample set on each leaf node as

This paper assumes that the structure of the tree is a certain value (i.e.,

#### 5.1.4 Structure score

Eq. (25) is a function for scoring a tree structure, called structure score. The smaller the score, the better the tree structure is. The algorithm searches for the optimal tree structure by using Eq. (25).

In Eq. (26),

#### 5.1.5 Best branch

The core part of the XGBoost algorithm is to obtain the optimal node based on the maximum gain obtained. XGBoost looks for the best branch using a greedy algorithm. The greedy algorithm traverses all possible segmentation points of all features, calculating the Gain value and selecting the maximum value to complete the segmentation. The greedy algorithm is an algorithm that controls the local optimum to achieve global optimization. The decision tree algorithm can also be considered as a method of greedy algorithm. XGBoost is an integrated model of the tree. If each leaf is optimal, the overall generated tree structure is optimal. This avoids enumerating all possible tree structures. XGBoost uses the objective function to measure the structure of the tree, and then let the tree grow from depth 0. Each time a branch calculation is implemented, XGBoost calculates the reduction in the objective function. When the reduction is below a certain value, the tree will stop growing.

### 5.2 Hybrid model based on RBM_SVR_DEGWO and XGBoost

After the boosting tree is created, the XGBoost algorithm extracts the importance score for each attribute. The XGBoost importance score measures the value of features in improving decision tree construction. The more an attribute is used to build a decision tree, the more important it is [35]. In order to further improve the accuracy of prediction and analyze the importance of feature quantity, this paper uses XGBoost to extract the feature quantity importance score. By combining the proposed RBM_SVR _DEGWO model prediction value, this paper proposes a hybrid prediction model, as shown in Table 3 .

Algorithm 2. Hybrid Algorithm based on RBM_SVR_DEGWO and XGBoost |

Input: data sets normalization Initialize Initialize for each dofor doend forend forSplit with max score Output: |

## 6. Experimental description and result analysis

This paper analyzes 2,468,059 trajectory data from Mobike’s shared bikes. The data covers more than 300,000 users and 400,000 shared bikes. The data of each rental trip includes the start time, the end time, the Geohash code of the starting position, the Geohash code of the ending position, the bicycle ID and the user ID.

GeoHash is an algorithm for spatial indexing. In the GeoHash theory, the Earth is considered to be a two-dimensional plane that can be divided into multiple sub-regions. The latitude and longitude inside the sub-area will correspond to the same code. GeoHash-based spatial indexing can improve the efficiency of spatial data for latitude and longitude retrieval. In this paper, GeoHash encodes a square plane separated by a square of latitude and longitude of 0.001373. To improve the prediction accuracy, this paper combines nine adjacent areas into a square area with a length of 411.9873 meters. This paper divides Beijing into 10 × 10 traffic zones and numbers them from 1 to 100. Various indicators of the traffic area will be used as input data for the prediction model, as shown in Table 4 .

Number | Variable name |
---|---|

1 | Zone number of origin traffic zone |

2 | Zone number of destination traffic zone |

3 | Longitude of the origin point |

4 | Latitude of the origin point |

5 | Longitude of the destination point |

6 | Latitude of the destination point |

7 | Distance between the center points of the traffic area |

8 | Absolute value of the difference in the numbers of traffic zone |

9 | Number of the day |

The output of the model is the daily transfer probability of traffic flow among the traffic zones *N* interconnected traffic areas,

where

This paper builds a set of destinations that may correspond to the origin traffic zone of the test day. The calculated destination candidates can be used to predict the probability of the traffic flow among the traffic zones. In the experiment, we selected data of different adjacent days as 6 test groups ( Table 5 ).

The first day of training data | The 2nd day of training data | Prediction data | ||||
---|---|---|---|---|---|---|

Test group | Date | Data amount (trajectories) | Date | Data amount (trajectories) | date | Data amount (trajectories) |

1 | 2017/5/10 | 262,569 | 2017/5/11 | 272,210 | 2017/5/12 | 265,173 |

2 | 2017/5/13 | 225,281 | 2017/5/14 | 236,594 | 2017/5/15 | 279,554 |

3 | 2017/5/14 | 236,594 | 2017/5/15 | 279,554 | 2017/5/16 | 288,719 |

4 | 2017/5/15 | 279,554 | 2017/5/16 | 288,719 | 2017/5/17 | 322,201 |

5 | 2017/5/16 | 288,719 | 2017/5/17 | 322,201 | 2017/5/18 | 315,758 |

6 | 2017/5/10 | 262,569 | 2017/5/15 | 279,554 | 2017/5/16 | 288,719 |

Based on data for the past 2 days as the training data, this paper predicts the subsequent third day of the transfer probabilities of bike-sharing traffic flow. Figure 1 is the root mean square errors of a prediction result of transfer probabilities of bike-sharing traffic flow in Beijing based on the RBM_SVR_DEGWO algorithm.

Compared to the surrounding area, the central area of the city has higher shared bicycle usage and more bicycle trajectory data. Therefore, the Root Mean Square Error of the central region is smaller.

To illustrate the performance of the RBM_SVR_DEGWO algorithm, we calculated the predicted values of the SVR algorithm, the RBM_SVR algorithm, and the RBM_SVR_DEGWO algorithm based on the data from the experimental groups in Table 5 . To ensure the fairness of the results, the data, network structure and parameter settings consistent. Figure 2 shows the mean-square error bars of the predicted transfer probabilities of SVR, RBM_SVR, and RBM_SVR_DEGWO.

The average values of the mean squared errors of the predicted values of the transfer probabilities of the algorithms SVR, RBM_SVR, and RBM_SVR_DEGWO are gradually reduced. The average mean square error of the SVR is 0.0916, the RBM_SVR is 0.0542, and the RBM_SVR_DEGWO is 0.0283. RBM improves the prediction accuracy of the model through the deep network structure. The DEGWO algorithm stabilizes the prediction value error to a lower value by optimizing the parameters of the RBM-SVR. Compared with SVR and RBM_SVR, RBM_SVR_DEGWO algorithm has better robustness.

According to the proposed hybrid algorithm of RBM_SVR_DEGWO and XGBoost, the value of transfer probabilities of bike-sharing traffic flow can be predicted. The data set for this experiment is from the grouped data of Table 5 . The training data set, test data set, and feature variables are the same as those used in the previous experiments. Table 6 is the parameters and explanation of the XGBoost model.

Parameter | Value | Interpretation |
---|---|---|

early_stopping_rounds | 200 | If the loss function does not decrease after the model has been added to n trees continuously, the model will stop adding trees. |

eval_metric | linear | Evaluation index |

eta | 0.3 | The shrinking step size used in the update process. Eta is used to prevent overfitting. |

min_child_weight | 1 | Min_child_weight refers to the sum of the weights of the smallest leaf nodes. If the sample weight of a leaf node is less than min_child_weight then the splitting process ends. |

max_depth | 6 | Maximum depth of the tree |

lambda | 1 | Penalty factor for L2 regular terms |

alpha | 0 | Penalty factor for L1 regular terms |

objective | linear | Loss function |

The root mean square error of the predicted values of the RBM_SVR_DEGWO algorithm, the XGBoost algorithm, and the hybrid algorithm is shown in Figure 3 . In the six experimental groups, the mean, variance, kurtosis, maximum, minimum, and range of the predicted root mean square error of the RBM_SVR_DEGWO algorithm, the XGBoost algorithm, and the hybrid algorithm are shown in Figure 3 .

The statistical characteristics of the proposed root mean square error of the algorithms are shown in Figure 4 . The root-mean-square error of the predicted value of the mixed algorithm has a high kurtosis value. It indicates that the variance increases of root mean square error is caused by the extreme difference of low frequency greater than or less than the mean value. The plots of the minimum and variance indicate that RBM_SVR_DEGWO can achieve higher prediction accuracy than XGBoost. XGBoost is more stable than RBM_SVR_DEGWO in the prediction process. In the six experimental groups, compared with the RBM_SVR_DEGWO algorithm and the XGBoost algorithm, the mean, variance, maximum, minimum, and range of the root mean square error of the predicted value of the hybrid algorithm are lower. Therefore, by combining the prediction results of the RBM_SVR_DEGWO algorithm and the XGBoost algorithm, the hybrid algorithm improves the prediction accuracy and obtains a lower root mean square error of the predicted value. XGBoost scores the importance of each feature based on the number of times the feature is used to segment the sample in all trees and the average gain in all trees. In the six experimental groups, the ranking of each input feature variable is as shown in Figure 5 .

The main factors affecting the transfer probabilities of bike-sharing traffic flow are the destination traffic zone number, the origin traffic zone number, and the absolute value of the difference between the numbers of traffic zone. It shows that the shared bike rider’s choice of destination is usually affected by the starting point, the end position and the distance of the journey. The shared bicycle service is suitable for short trips. Travel destinations for shared bike riders are usually nearby business and lifestyle centers, and bus stops. The dates of the data for the six groups of experiments are within weekdays. On a normal weekday, for the riders in the same community, the main travel destinations are somewhat similar and fixed. Therefore, information such as the cell number of the origin and destination becomes a key factor for predicting the probability of travel destination.

## 7. Conclusions

The principal objective of this study is to predict the traffic flow transfer probability of shared bicycle by proposing a hybrid deep learning algorithm and accurately reflect the transfer probability of the user’s OD demand. First, this paper constructs a deep-structured RBM model and connects it to the SVR model for predicting continuous probability values. Furthermore, we utilize the DEGWO optimization algorithm, named, to optimize the parameters

Based on the comparison results, it demonstrates that the proposed hybrid algorithm outperformed the XGBoost model and RBM_SVR_DEGWO model. The XGBoost algorithm improves the stability of the prediction process and reduces the error of the RBM_SVR_DEGWO algorithm at extreme points. The deep-structured RBM algorithm simulates the probability distribution that best produces the training samples. In the case of massive training data, RBM improves the efficiency of algorithm calculation utilizing Gibbs sampling of small-batch data. In the DEGWO algorithm, the GWO algorithm guarantees the global search capability, and the DE algorithm avoids the fall into a local optimal through the mutant individual, crossover, and selection operations.