Deep Learning Based Prediction of Transfer Probability of Shared Bikes Data

In the pile-free bicycle sharing scheme, the parking place and time of the bicycle are arbitrary. The distribution of the pile does not constrain the origin and destination of the journey. The travel demand of the user can be derived from the use of the shared bicycle. The goal of this article is to predict the probability of transition for a shared bicycle user destination based on a deep learning algorithm and a large amount of trajectory data. This study combines eXtreme Gradient Boosting (XGBoost) algorithm, stacked Restricted Boltzmann Machines (RBM), support vector regression (SVR), Differential Evolution (DE) algorithm, and Gray Wolf Optimization (GWO) algorithm. In an experimental case, the destinations of the cycling trips and the probability of traffic flow transfer for shared bikes between traffic zones were predicted by computing 2.46 million trajectory points recorded by shared bikes in Beijing. The hybrid algorithm can improve the accuracy of prediction, analyze the importance of various factors in the prediction of transfer probability, and explain the travel preferences of users in the pile free bicycle-sharing scheme.


Introduction
Bicycle sharing is a new type of transportation with low energy consumption and emissions. It serves short-distance travel and helps solve the "last mile" problem [1]. With the rapid development of the mobile Internet, the pileless bicycle began to replace the pile station bicycle [2]. In the pile-free bicycle sharing scheme, the parking place and time of the bicycle are arbitrary. The distribution of the pile does not constrain the origin and destination of the journey. The travel demand of the user can be derived from the use of the shared bicycle. The distribution of destinations for shared bike users is a valuable study. However, the large number of shared bicycle tracks requires a lot of computation time. This paper sets up different traffic areas and studies the law of shared bicycle flow transfer between the traffic areas. On this basis, we predict the ratio of the traffic flow of shared bicycles between traffic areas. It can be considered as the probability that the shared bicycle user selects the traffic area A as the origin and the traffic area B as the destination.
The pile-free shared bicycle system puts a large number of bicycles in the city. The amount is much higher than the number of traditional piled public bicycles. Therefore, when dealing with massive volume of trajectory data volume as a data set, classical statistical methods and traditional neural network algorithms would have limited processing capabilities.
As a newly developed travel method, the algorithms for the destination prediction of trips based on shared bikes need to be researched in depth [3][4][5]. In Deep neural networks (DNN), the model with multi-hidden layers can be developed based on the artificial neural network. The hidden layers of DNN convert the input data into a more abstract compound representation [6][7][8][9][10].
The Restricted Boltzmann Machine (RBM) is an algorithm that can be used for dimensionality reduction, classification, regression, and feature learning problems. RBM reconstructs data in an unsupervised algorithm and adjusts the weight through the process of reverse transfer and forward transfer. The RBM gradually approaches the original input and learns the probability distribution on the input set [11][12][13][14][15].
In this paper, a stacked RBM-SVR algorithm is constructed by combining support vector regression (SVR) [16] and stacking RMB algorithm. RBM-SVR is used to predict continuous output values. The error penalty factor c s and kernel function parameter γ s are the basic parameters of the radial basis function of the SVR model. The value of c s and γ s will directly affect the fit and generalization ability of the SVR [17][18][19]. In order to improve the accuracy of prediction, this paper needs to introduce intelligent algorithms to optimize the selection of parameter values.
In machine learning algorithms, Mirjalili et al. [20] proposed Gray Wolf Optimizer (GWO) as a meta-heuristic algorithm for solving many multi-modal functions in 2014. In addition, Storn and Price [21] proposed a differential evolution (DE) algorithm. The DE algorithm is an optimization algorithm based on modern intelligent theory. DE intelligently optimizes the direction of search through groups generated by cooperation and competition among individuals. Based on the above theory, an algorithm called the differential evolution Gray Wolf Optimizer algorithm (DEGWO) is used to optimize c s and γ s . DEGWO generates initial populations, subpopulations, and variant populations for each iteration, and then uses the GWO's capabilities for global searching to optimize the c s and γ s parameters.
After processing by RBM algorithm, the input data are transformed into a sparse matrix containing a high amount of information. It can reduce the computation time of SVR prediction, but it may also increase the number of outliers in the SVR algorithm, increase the complexity of the SVR model and reduce the stability of the fitting process. To solve this problem, this paper proposes a hybrid algorithm that combines the eXtreme Gradient Boosting (XGBoost) algorithm with the stacked RBM-SVR algorithm. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable [22]. XGBoost uses the Exclusive Feature Bundling method to transform several sparse features into a dense matrix [23]. In the process, the approximate greedy algorithm is used to find the best combination of merged features when the number of bundles is the smallest. The XGB algorithm realizes the partition of nodes by the second order gradient and optimizes the greedy algorithm of splitting nodes by the local approximation algorithm [24].
The principal purpose of this paper is to build a hybrid model that combines the XGBoost model, the stacked RBM-SVR network, and the DEGWO optimization algorithm. This paper analyzes the trajectory data of shared bicycles, extracts the cell information, and predicts the probability of user destination selection in the traffic area, that is, predicts the transfer probability of shared bikes.

Background
Artificial intelligence (AI) is a domain of computer science that studies how to apply computings to simulate the fundamental theories, methods, and techniques of human knowledge. As the mainstream algorithm of artificial intelligence, deep learning is considered capable of solving many challenges in the field of computer vision, prediction, and optimization. It realizes the automatic positioning of targets and automated learning of target features, which improves the speed and accuracy of target detection. Artificial intelligence is mainly used to share bicycles in the following aspects.
First, the user's travel behavior and the law of spatial movement can be obtained through machine learning algorithms and statistical theory analysis. The user's travel preferences can be quantitatively analyzed. Researchers can discuss the impact of various influencing factors on shared bicycle usage, such as the mix of land use, the degree of convergence with public transport facilities, the sharing of bicycle infrastructure, rainfall and high temperatures [25,26].
Second, through the deep learning algorithm of AI technology, the dynamic demand and parking demand of the shared bicycle users can be predicted. The focus of this paper is on this issue. This paper uses a deep learning algorithm to predict the probability of user destination selection. Xu et al. [27] prove that the long short-term memory neural networks (LSTM NNs) in deep learning algorithms are superior to traditional statistical metrology algorithms and advanced machine learning algorithms. LSTM NNs can better predict the riding demand dynamically. Besides, based on the distribution of road networks and travel needs, researchers can predict parking demand and develop better layout strategies for electronic fences [28].
Finally, according to the deep reinforcement learning algorithm in AI technology, a shared bicycle scheduling model can be constructed. Deep reinforcement learning combines the perception of deep learning with the decision-making ability of reinforcement learning. It can be directly controlled according to the original input data. It is an artificial intelligence method that is closer to human thinking. Based on this algorithm, the dynamic scheduling model can efficiently optimize goals such as improving user satisfaction and reducing system cost [29][30][31].

A stacked RBM_SVR deep learning algorithm
RBM_SVR is a deep learning model that connects three stacked RBM models and one SVR model. First, in RBM_SVR, the bottommost RBM is trained with the original input data, and the top RBM takes the feature extracted by the bottom RBM as input and continues training. RBM_SVR repeats this process until the topmost RBM model is trained. Secondly, RBM_SVR fine-tunes the network through the traditional global learning algorithm (BP algorithm), so that the model can converge to the local best. Finally, RBM_SVR can efficiently train a deep network and output the predicted probability value according to the SVR model.
Each RBM model has a visible layer v and a hidden layer h. The neurons inside the RBM layer are unconnected, but the neurons between the layers are fully connected. The value of the RBM node variable is 0 or 1. The number of layers of visible layers and hidden layers of the RBM_SVR model are n and m, respectively. The energy equation of RBM_SVR is given by Eq. (1) involves the parameters of RBM; w ij is a connection weight between the visible layer i and the hidden layer j; a i represents the bias of the visible layer, and b j denotes the bias of the hidden layer. P v, hjθ ð Þ¼ Since each visible layer and each hidden layer are independent, the activation probability of the hidden layer j and the ith visible layer are shown in Eqs. (2) and (3), respectively [33].
In RBM_SVR, the number of neurons per layer of RBM is 300. Based on the abstracted vector output from the stacked RBM model, the SVR model predicts the probability of the traffic transfer among traffic zones y d , as shown in Eq. (4).
where x is the input dataset, RBM:SVR represents the RBM_SVR model.

An improved RBM-SVR algorithm 4.1 Principles of GWO algorithm
Assume that in a D-dimensional search space, the population size X ¼ tion of the gray wolf i h -the solution to the optimization problem. The top three wolves of the optimal solution of the objective function are wolf α, wolf β, and wolf δ, respectively. They are also the main wolves that guide the rest of the wolves to explore the optimal solution. The rest of the solution corresponds to the wolf as wolf ω. The parameters and explanations of the GWO algorithm are shown in Table 1.
The update process of X t ð Þ is given by Eq. (5). The first three obtained optimal values are saved to enforce other searching individuals (including ω) to constantly update their positions according to the position of the optimal value, and the calculation method is expressed as Eqs. (6)- (7).
where π ¼ α, β, δ; μ ¼ 1, 2, 3. The distances between the other individual gray wolves and α, β, and δ, as well as the distances D π ¼ C μ X π t ð Þ À X t ð Þ between them and the updated position of the gray wolf are be determined by and (6). Then, the position of the prey can be determined by Eq. (7).

Principles of the DE algorithm
Assume that in the D-dimensional search space, in the population size NP, Z g ð Þ is the gth generation of the population, Z g , k ¼ 1, 2, … , NP, g ¼ 1, 2, … , g max , and g max is the number of the last iteration.

Initialization of the population
Initially, the algorithm randomly generates the 0th generation of the population over the entire search space, and the value of the individual z k,q 0 ð Þ in each dimension q is generated according to Eq. (8).
where q ¼ 1, 2, … , D, rand 0, 1 ð Þ is a random number, which is uniformly distributed within 0, 1 ½ , z L k,q is the lower threshold of the individual population, z U k,q is the upper threshold of the individual population.
where z p 1 , z p2 , z p3 are three different parameter vectors randomly selected from the current population, and z p 1 The number of current iterations C The swing factor C ¼ 2r 1 The position of the prey after the tth iteration X t ð Þ The position of the gray wolf during the tth iteration a a linearly decreases from 2 to 0 with the increase of the number of iterations The distances between the individual gray wolves, D π ¼ C μ X π t ð Þ À X t ð Þ Table 1.
Parameters and explanations of the GWO algorithm.

Crossover
The crossover process in the DE algorithm is expressed as Eq. (10).
where CR is the crossover probability within 0, 1 ½ , and rand 0, 1 ð Þ is a random number, which is uniformly distributed within 0, 1 ½ and used to guarantee that at least one-dimensional component comes from the target vector Z k .

Selection
Selection operation compares the vector μ k g þ 1 ð Þand the vector z k g ð Þ by an evaluation function, which is given by Eq. (11).
Therefore, this mechanism allows the populations of the offspring to evolve based on the current population. This optimization mechanism can improve the average optimization ability of the population and converge the optimal solution.
Output: r test , Z parent:α Â Ã Initialize a, A, C, Z parent and objective function V parent for each individual wolf k do

DEGWO algorithm
In the DEGWO algorithm, S degwo:dbn ¼ NP, g max , CR, D, ub, lb, F À Á where NP denotes population size, g max denotes the maximum number of iterations, ub and lb are the search range. r test and r train denote the error in test and learning procedure respectively. Table 2 is the specific procedure employing the DE and the GWO algorithms to optimize parameters c s and γ s in the RBM-SVR deep learning model.
Here, x i is a feature vector and i is the number of data points. h n x i ð Þ is the regression tree function. h n ∈ H, H is the set space of the regression trees.
In Eq. (13), f : R m ! T, f X ð Þ indicates that sample X is classified on a leaf node. T represents the number of leaf nodes of the tree. α is the score of the leaf node. α f x ð Þ represents the predicted value of the regression tree for the sample.

XGBoost learning objective function
The objective function based on the parameter space is shown in the following Eq. (14).
where Ω ϕ ð Þ is a regularization term, indicating a penalty value for the complexity of the model. The regular term Ω ϕ ð Þ in the linear model includes: the regular term L 1 , Ω α ð Þ ¼ λ α k k 1 , and the regular L 2 , Ω α ð Þ ¼ λ α k k 2 . L ϕ ð Þ is an error function that measures the fitting accuracy of the model. A can reduce model bias, such as square loss, exponential loss. Compared to GBDT, XGBoost adds a regular term to the objective function. XGBoost punishes the complexity of each regression tree and avoids overfitting during learning. XGBoost measures the complexity of the tree such as the number of internal nodes, the depth of the tree, the number of leaf nodes T, the leaf node score α, etc. XGBoost uses the regular term as shown in Eq. (15).

Model optimization
In the model parameter optimization process, each iteration model is always added a new function on the optimal model obtained from the previous training. After the kth iteration, the prediction of the model is equal to the prediction function of the first k À 1th model prediction function combined with the kth tree, as shown by Eq. (16) The objective function can be rewritten to Eq. (17).
In formula (17), the model's goal is to learn the function of the kth tree. When the error function is replaced by a second-order Taylor expansion, the objective function can be rewritten as Eq. (18). When This objective function solves regression, classification, and sorting problems. Eqs. (20) and (21) are in the form of a tree structure of the regression tree function and the regular term. The objective function can be updated to Eq. (22).
This article defines the sample set on each leaf node as J j ¼ i f x i ð Þ ¼ j j f g . The objective function based on the form of leaf node accumulation is Eq. (23).
This paper assumes that the structure of the tree is a certain value (i.e., f x i ð Þ is determined). To solve the problem of minimizing the objective function, we can make the derivative of the objective function zero. The optimal predicted score for each leaf node is Eq. (24). The formula for the minimum loss function is Eq. (25), which can be thought of as a function that scores the tree structure. The tree structure is gradually optimized as the score is reduced.

Structure score
Eq. (25) is a function for scoring a tree structure, called structure score. The smaller the score, the better the tree structure is. The algorithm searches for the optimal tree structure by using Eq. (25).L * represents the contribution of the leaf node to the overall loss. The goal of the algorithm is to minimize the loss, so the larger part of δ 2 j η j þλ could be as good as possible. This article expands a leaf node and defines the gain as shown in Eq. (26).
In Eq. (26), δ 2 j η L þλ is the score of the left subtree, δ 2 R η R þλ is the score of the right subtree, η L þη R þλ is the score without division, and γ is the cost of the complexity after introducing the new leaf node. The larger the value of gain, the more loss after splitting is reduced. Therefore, when segmenting a leaf node, we calculate the gain corresponding to all candidate features and select the segment with the largest gain.

Best branch
The core part of the XGBoost algorithm is to obtain the optimal node based on the maximum gain obtained. XGBoost looks for the best branch using a greedy algorithm. The greedy algorithm traverses all possible segmentation points of all features, calculating the Gain value and selecting the maximum value to complete the segmentation. The greedy algorithm is an algorithm that controls the local optimum to achieve global optimization. The decision tree algorithm can also be considered as a method of greedy algorithm. XGBoost is an integrated model of the tree. If each leaf is optimal, the overall generated tree structure is optimal. This avoids enumerating all possible tree structures. XGBoost uses the objective function to measure the structure of the tree, and then let the tree grow from depth 0. Each time a branch calculation is implemented, XGBoost calculates the reduction in the objective function. When the reduction is below a certain value, the tree will stop growing.

Hybrid model based on RBM_SVR_DEGWO and XGBoost
After the boosting tree is created, the XGBoost algorithm extracts the importance score for each attribute. The XGBoost importance score measures the value of features in improving decision tree construction. The more an attribute is used to build a decision tree, the more important it is [35]. In order to further improve the accuracy of prediction and analyze the importance of feature quantity, this paper uses XGBoost to extract the feature quantity importance score. By combining the proposed RBM_SVR _DEGWO model prediction value, this paper proposes a hybrid prediction model, as shown in Table 3.

Experimental description and result analysis
This paper analyzes 2,468,059 trajectory data from Mobike's shared bikes. The data covers more than 300,000 users and 400,000 shared bikes. The data of each rental trip includes the start time, the end time, the Geohash code of the starting position, the Geohash code of the ending position, the bicycle ID and the user ID.
GeoHash is an algorithm for spatial indexing. In the GeoHash theory, the Earth is considered to be a two-dimensional plane that can be divided into multiple subregions. The latitude and longitude inside the sub-area will correspond to the same code. GeoHash-based spatial indexing can improve the efficiency of spatial data for latitude and longitude retrieval. In this paper, GeoHash encodes a square plane separated by a square of latitude and longitude of 0.001373. To improve the prediction accuracy, this paper combines nine adjacent areas into a square area with a length of 411.9873 meters. This paper divides Beijing into 10 Â 10 traffic zones and numbers them from 1 to 100. Various indicators of the traffic area will be used as input data for the prediction model, as shown in Table 4.
The output of the model is the daily transfer probability of traffic flow among the traffic zones p t I,J , which is given by Eq. (27). In the cities of N interconnected traffic areas, p t I,J indicates the transfer probability of the traffic flow with the original point I and the destination J in day t.
Algorithm 2. Hybrid Algorithm based on RBM_SVR_DEGWO and XGBoost Output: y hybrid Table 3.
Hybrid algorithm based on RBM_SVR_DEGWO and XGBoost.
where I ¼ 1, 2, 3, … , N; J ¼ 1, 2, 3, … , N; d t I,J refers to the traffic flow with the original point I and the destination J in day d. p t I,J represents the origin-destination (OD) probability distribution and reflects the distribution of demand in the city. This paper builds a set of destinations that may correspond to the origin traffic zone of the test day. The calculated destination candidates can be used to predict the probability of the traffic flow among the traffic zones. In the experiment, we selected data of different adjacent days as 6 test groups ( Table 5).
Based on data for the past 2 days as the training data, this paper predicts the subsequent third day of the transfer probabilities of bike-sharing traffic flow. Figure 1 is the root mean square errors of a prediction result of transfer probabilities of bike-sharing traffic flow in Beijing based on the RBM_SVR_DEGWO algorithm.
Compared to the surrounding area, the central area of the city has higher shared bicycle usage and more bicycle trajectory data. Therefore, the Root Mean Square Error of the central region is smaller.
To illustrate the performance of the RBM_SVR_DEGWO algorithm, we calculated the predicted values of the SVR algorithm, the RBM_SVR algorithm, and the RBM_SVR_DEGWO algorithm based on the data from the experimental groups in Table 5. To ensure the fairness of the results, the data, network structure and parameter settings consistent. Figure 2 shows the mean-square error bars of the predicted transfer probabilities of SVR, RBM_SVR, and RBM_SVR_DEGWO.  The average values of the mean squared errors of the predicted values of the transfer probabilities of the algorithms SVR, RBM_SVR, and RBM_SVR_DEGWO are gradually reduced. The average mean square error of the SVR is 0.0916, the RBM_SVR is 0.0542, and the RBM_SVR_DEGWO is 0.0283. RBM improves the prediction accuracy of the model through the deep network structure. The DEGWO algorithm stabilizes the prediction value error to a lower value by optimizing the parameters of the RBM-SVR. Compared with SVR and RBM_SVR, RBM_SVR_DEGWO algorithm has better robustness.
According to the proposed hybrid algorithm of RBM_SVR_DEGWO and XGBoost, the value of transfer probabilities of bike-sharing traffic flow can be predicted. The data set for this experiment is from the grouped data of Table 5. The training data set, test data set, and feature variables are the same as those used in the previous experiments. Table 6 is the parameters and explanation of the XGBoost model.  The root mean square error of the predicted values of the RBM_SVR_DEGWO algorithm, the XGBoost algorithm, and the hybrid algorithm is shown in Figure 3. In the six experimental groups, the mean, variance, kurtosis, maximum, minimum, and range of the predicted root mean square error of the RBM_SVR_DEGWO algorithm, the XGBoost algorithm, and the hybrid algorithm are shown in Figure 3.
The statistical characteristics of the proposed root mean square error of the algorithms are shown in Figure 4. The root-mean-square error of the predicted value of the mixed algorithm has a high kurtosis value. It indicates that the variance increases of root mean square error is caused by the extreme difference of low frequency greater than or less than the mean value. The plots of the minimum and variance indicate that RBM_SVR_DEGWO can achieve higher prediction accuracy than XGBoost. XGBoost is more stable than RBM_SVR_DEGWO in the prediction process. In the six experimental groups, compared with the RBM_SVR_DEGWO algorithm and the XGBoost algorithm, the mean, variance, maximum, minimum, and range of the root mean square error of the predicted value of the hybrid  Table 6.
Parameters and explanations of the XGBoost model. algorithm are lower. Therefore, by combining the prediction results of the RBM_SVR_DEGWO algorithm and the XGBoost algorithm, the hybrid algorithm improves the prediction accuracy and obtains a lower root mean square error of the predicted value. XGBoost scores the importance of each feature based on the number of times the feature is used to segment the sample in all trees and the average gain in all trees. In the six experimental groups, the ranking of each input feature variable is as shown in Figure 5.
The main factors affecting the transfer probabilities of bike-sharing traffic flow are the destination traffic zone number, the origin traffic zone number, and the absolute value of the difference between the numbers of traffic zone. It shows that the shared bike rider's choice of destination is usually affected by the starting point, the end position and the distance of the journey. The shared bicycle service is suitable for short trips. Travel destinations for shared bike riders are usually nearby business and lifestyle centers, and bus stops. The dates of the data for the six groups of experiments are within weekdays. On a normal weekday, for the riders in the same community, the main travel destinations are somewhat similar and fixed. Therefore, information such as the cell number of the origin and destination becomes a key factor for predicting the probability of travel destination.

Conclusions
The principal objective of this study is to predict the traffic flow transfer probability of shared bicycle by proposing a hybrid deep learning algorithm and accurately reflect the transfer probability of the user's OD demand. First, this paper constructs a deep-structured RBM model and connects it to the SVR model for predicting continuous probability values. Furthermore, we utilize the DEGWO optimization algorithm, named, to optimize the parameters c s and γ s in the stacked RBM-SVR algorithm. XGBoost improves the prediction accuracy and analyzes the importance of the feature variables in the input data.
Based on the comparison results, it demonstrates that the proposed hybrid algorithm outperformed the XGBoost model and RBM_SVR_DEGWO model. The XGBoost algorithm improves the stability of the prediction process and reduces the error of the RBM_SVR_DEGWO algorithm at extreme points. The deep-structured RBM algorithm simulates the probability distribution that best produces the training samples. In the case of massive training data, RBM improves the efficiency of algorithm calculation utilizing Gibbs sampling of small-batch data. In the DEGWO algorithm, the GWO algorithm guarantees the global search capability, and the DE algorithm avoids the fall into a local optimal through the mutant individual, crossover, and selection operations.

Author details
Wenwen Tu Southwest Jiaotong University, Chengdu, China *Address all correspondence to: tuvivimic@gmail.com © 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.