Open access peer-reviewed chapter - ONLINE FIRST

Deep Learning Based Prediction of Transfer Probability of Shared Bikes Data

By Wenwen Tu

Submitted: June 7th 2019Reviewed: September 20th 2019Published: October 29th 2019

DOI: 10.5772/intechopen.89835

Downloaded: 36

Abstract

In the pile-free bicycle sharing scheme, the parking place and time of the bicycle are arbitrary. The distribution of the pile does not constrain the origin and destination of the journey. The travel demand of the user can be derived from the use of the shared bicycle. The goal of this article is to predict the probability of transition for a shared bicycle user destination based on a deep learning algorithm and a large amount of trajectory data. This study combines eXtreme Gradient Boosting (XGBoost) algorithm, stacked Restricted Boltzmann Machines (RBM), support vector regression (SVR), Differential Evolution (DE) algorithm, and Gray Wolf Optimization (GWO) algorithm. In an experimental case, the destinations of the cycling trips and the probability of traffic flow transfer for shared bikes between traffic zones were predicted by computing 2.46 million trajectory points recorded by shared bikes in Beijing. The hybrid algorithm can improve the accuracy of prediction, analyze the importance of various factors in the prediction of transfer probability, and explain the travel preferences of users in the pile free bicycle-sharing scheme.

Keywords

  • deep learning
  • restricted Boltzmann machines
  • support vector regression
  • eXtreme gradient boosting
  • shared bikes data

1. Introduction

Bicycle sharing is a new type of transportation with low energy consumption and emissions. It serves short-distance travel and helps solve the “last mile” problem [1]. With the rapid development of the mobile Internet, the pileless bicycle began to replace the pile station bicycle [2]. In the pile-free bicycle sharing scheme, the parking place and time of the bicycle are arbitrary. The distribution of the pile does not constrain the origin and destination of the journey. The travel demand of the user can be derived from the use of the shared bicycle. The distribution of destinations for shared bike users is a valuable study. However, the large number of shared bicycle tracks requires a lot of computation time. This paper sets up different traffic areas and studies the law of shared bicycle flow transfer between the traffic areas. On this basis, we predict the ratio of the traffic flow of shared bicycles between traffic areas. It can be considered as the probability that the shared bicycle user selects the traffic area A as the origin and the traffic area B as the destination.

The pile-free shared bicycle system puts a large number of bicycles in the city. The amount is much higher than the number of traditional piled public bicycles. Therefore, when dealing with massive volume of trajectory data volume as a data set, classical statistical methods and traditional neural network algorithms would have limited processing capabilities.

As a newly developed travel method, the algorithms for the destination prediction of trips based on shared bikes need to be researched in depth [3, 4, 5]. In Deep neural networks (DNN), the model with multi-hidden layers can be developed based on the artificial neural network. The hidden layers of DNN convert the input data into a more abstract compound representation [6, 7, 8, 9, 10].

The Restricted Boltzmann Machine (RBM) is an algorithm that can be used for dimensionality reduction, classification, regression, and feature learning problems. RBM reconstructs data in an unsupervised algorithm and adjusts the weight through the process of reverse transfer and forward transfer. The RBM gradually approaches the original input and learns the probability distribution on the input set [11, 12, 13, 14, 15].

In this paper, a stacked RBM-SVR algorithm is constructed by combining support vector regression (SVR) [16] and stacking RMB algorithm. RBM-SVR is used to predict continuous output values. The error penalty factor csand kernel function parameter γsare the basic parameters of the radial basis function of the SVR model. The value of csand γswill directly affect the fit and generalization ability of the SVR [17, 18, 19]. In order to improve the accuracy of prediction, this paper needs to introduce intelligent algorithms to optimize the selection of parameter values.

In machine learning algorithms, Mirjalili et al. [20] proposed Gray Wolf Optimizer (GWO) as a meta-heuristic algorithm for solving many multi-modal functions in 2014. In addition, Storn and Price [21] proposed a differential evolution (DE) algorithm. The DE algorithm is an optimization algorithm based on modern intelligent theory. DE intelligently optimizes the direction of search through groups generated by cooperation and competition among individuals. Based on the above theory, an algorithm called the differential evolution Gray Wolf Optimizer algorithm (DEGWO) is used to optimize csand γs. DEGWO generates initial populations, subpopulations, and variant populations for each iteration, and then uses the GWO’s capabilities for global searching to optimize the csand γsparameters.

After processing by RBM algorithm, the input data are transformed into a sparse matrix containing a high amount of information. It can reduce the computation time of SVR prediction, but it may also increase the number of outliers in the SVR algorithm, increase the complexity of the SVR model and reduce the stability of the fitting process. To solve this problem, this paper proposes a hybrid algorithm that combines the eXtreme Gradient Boosting (XGBoost) algorithm with the stacked RBM-SVR algorithm. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable [22]. XGBoost uses the Exclusive Feature Bundling method to transform several sparse features into a dense matrix [23]. In the process, the approximate greedy algorithm is used to find the best combination of merged features when the number of bundles is the smallest. The XGB algorithm realizes the partition of nodes by the second order gradient and optimizes the greedy algorithm of splitting nodes by the local approximation algorithm [24].

The principal purpose of this paper is to build a hybrid model that combines the XGBoost model, the stacked RBM-SVR network, and the DEGWO optimization algorithm. This paper analyzes the trajectory data of shared bicycles, extracts the cell information, and predicts the probability of user destination selection in the traffic area, that is, predicts the transfer probability of shared bikes.

2. Background

Artificial intelligence (AI) is a domain of computer science that studies how to apply computings to simulate the fundamental theories, methods, and techniques of human knowledge. As the mainstream algorithm of artificial intelligence, deep learning is considered capable of solving many challenges in the field of computer vision, prediction, and optimization. It realizes the automatic positioning of targets and automated learning of target features, which improves the speed and accuracy of target detection. Artificial intelligence is mainly used to share bicycles in the following aspects.

First, the user’s travel behavior and the law of spatial movement can be obtained through machine learning algorithms and statistical theory analysis. The user’s travel preferences can be quantitatively analyzed. Researchers can discuss the impact of various influencing factors on shared bicycle usage, such as the mix of land use, the degree of convergence with public transport facilities, the sharing of bicycle infrastructure, rainfall and high temperatures [25, 26].

Second, through the deep learning algorithm of AI technology, the dynamic demand and parking demand of the shared bicycle users can be predicted. The focus of this paper is on this issue. This paper uses a deep learning algorithm to predict the probability of user destination selection. Xu et al. [27] prove that the long short-term memory neural networks (LSTM NNs) in deep learning algorithms are superior to traditional statistical metrology algorithms and advanced machine learning algorithms. LSTM NNs can better predict the riding demand dynamically. Besides, based on the distribution of road networks and travel needs, researchers can predict parking demand and develop better layout strategies for electronic fences [28].

Finally, according to the deep reinforcement learning algorithm in AI technology, a shared bicycle scheduling model can be constructed. Deep reinforcement learning combines the perception of deep learning with the decision-making ability of reinforcement learning. It can be directly controlled according to the original input data. It is an artificial intelligence method that is closer to human thinking. Based on this algorithm, the dynamic scheduling model can efficiently optimize goals such as improving user satisfaction and reducing system cost [29, 30, 31].

3. A stacked RBM_SVR deep learning algorithm

RBM_SVR is a deep learning model that connects three stacked RBM models and one SVR model. First, in RBM_SVR, the bottommost RBM is trained with the original input data, and the top RBM takes the feature extracted by the bottom RBM as input and continues training. RBM_SVR repeats this process until the topmost RBM model is trained. Secondly, RBM_SVR fine-tunes the network through the traditional global learning algorithm (BP algorithm), so that the model can converge to the local best. Finally, RBM_SVR can efficiently train a deep network and output the predicted probability value according to the SVR model.

Each RBM model has a visible layer vand a hidden layer h. The neurons inside the RBM layer are unconnected, but the neurons between the layers are fully connected. The value of the RBM node variable is 0 or 1. The number of layers of visible layers and hidden layers of the RBM_SVR model are nand m, respectively. The energy equation of RBM_SVR is given by Eq. (1)

Evhθ=i=1naivij=1mbjhji=1nj=1mviwijhjE1

where i,j01, θ=wijaibjinvolves the parameters of RBM; wijis a connection weight between the visible layer iand the hidden layer j; airepresents the bias of the visible layer, and bjdenotes the bias of the hidden layer. Pvhθ=eEvhθv,heEvhθis the joint probability distribution of this set of states vh[32]. Pvθ=heEvhθv,heEvhθis the marginal distribution of Pv,hθ. Since each visible layer and each hidden layer are independent, the activation probability of the hidden layer jand the ithvisible layer are shown in Eqs. (2) and (3), respectively [33].

Pvi=1hθ=σai+jwijhjE2
Phj=1vθ=σbj+iviwijE3

where σx=11+expx. In RBM_SVR, the number of neurons per layer of RBM is 300. Based on the abstracted vector output from the stacked RBM model, the SVR model predicts the probability of the traffic transfer among traffic zones yd, as shown in Eq. (4).

yd=RBM.SVRxE4

where xis the input dataset, RBM.SVRrepresents the RBM_SVR model.

4. An improved RBM-SVR algorithm

4.1 Principles of GWO algorithm

Assume that in a D-dimensional search space, the population size X=X1X2XN˜is composed of Nindividuals. Xih=Xih1Xih2XihDis the location of the gray wolf ih-the solution to the optimization problem. The top three wolves of the optimal solution of the objective function are wolf α, wolf β, and wolf δ, respectively. They are also the main wolves that guide the rest of the wolves to explore the optimal solution. The rest of the solution corresponds to the wolf as wolf ω. The parameters and explanations of the GWO algorithm are shown in Table 1.

ParameterInterpretation
tThe number of current iterations
CThe swing factor C=2r1
XptThe position of the prey after the tth iteration
XtThe position of the gray wolf during the tth iteration
r1A random number within 01
AThe convergence factor, A=2ar2a
r2A random number uniformly distributed within [0,1]
aalinearly decreases from 2 to 0 with the increase of the number of iterations
DπThe distances between the individual gray wolves, Dπ=CμXπtXt

Table 1.

Parameters and explanations of the GWO algorithm.

The update process of Xtis given by Eq. (5). The first three obtained optimal values are saved to enforce other searching individuals (including ω) to constantly update their positions according to the position of the optimal value, and the calculation method is expressed as Eqs. (6)(7).

Xt+1=XptACXptXtE5
Xμt+1=XπtAμDπE6
Xpt+1=13μ=13XμE7

where π=α,β,δ; μ=1,2,3. The distances between the other individual gray wolves and α, β, and δ, as well as the distances Dπ=CμXπtXtbetween them and the updated position of the gray wolf are be determined by and (6). Then, the position of the prey can be determined by Eq. (7).

4.2 Principles of the DE algorithm

Assume that in the D-dimensional search space, in the population size NP, Zgis the gth generation of the population, Zg=Z1gZ2gZnpg. Zkgis the kth individual in the gth generation of the population, Zkg=Zk,1gZk,2gZk,Dg, k=1,2,,NP, g=1,2,,gmax, and gmaxis the number of the last iteration.

4.2.1 Initialization of the population

Initially, the algorithm randomly generates the 0th generation of the population over the entire search space, and the value of the individual zk,q0in each dimension qis generated according to Eq. (8).

zk,q0=zk,qL+rand01zk,qUzk,qLE8

where q=1,2,,D, rand01is a random number, which is uniformly distributed within 01, zk,qLis the lower threshold of the individual population, zk,qUis the upper threshold of the individual population.

4.2.2 Mutation

Mutant individual is generated via Eq. (9).

τk,qg=zp1+Fzp2zp3E9

where zp1, zp2, zp3are three different parameter vectors randomly selected from the current population, and zp1zp2zp3i; Fis an amplifying factor within [0,1].

4.2.3 Crossover

The crossover process in the DE algorithm is expressed as Eq. (10).

μkg+1=τk,qg,rand01CRorq=rand01τk,qg,rand01CRorqrand01E10

where CR is the crossover probability within 01, and rand01is a random number, which is uniformly distributed within 01and used to guarantee that at least one-dimensional component comes from the target vector Zk.

4.2.4 Selection

Selection operation compares the vector μkg+1and the vector zkgby an evaluation function, which is given by Eq. (11).

zk,qg+1=μkg+1,fμkg+1<fzkgzkg,fμkg+1fzkgE11

Therefore, this mechanism allows the populations of the offspring to evolve based on the current population. This optimization mechanism can improve the average optimization ability of the population and converge the optimal solution.

4.3 DEGWO algorithm

In the DEGWO algorithm, Sdegwo.dbn=NPgmaxCRDublbFwhere NPdenotes population size, gmaxdenotes the maximum number of iterations, uband lbare the search range. rtestand rtraindenote the error in test and learning procedure respectively. Table 2 is the specific procedure employing the DE and the GWO algorithms to optimize parameters csand γsin the RBM-SVR deep learning model.

Algorithm 1. RBM_SVR _DEGWO
Input:Dtrain=x1y1(x2y2)(xmym), Dtest=x1y1(x2y2)(xnyn), Sdegwo.dbn;
Output:rtestZparent.α
Initialize a, A, C, Zparentand objective function Vparent
for each individual wolf kdo
VparentRBM_SVR (Dtrain, Dtest, Zparent)
end for
sort Vparent
compute top three gray wolf individuals XαXβXδ
for each generation gdo
update a2g2/gmax
for each individual wolf kdo
ZparentXp
VparentRBM_SVR (Dtrain, Dtest, Zparent)
compute mutant individuals τk,qzp1+Fzp2zp3
compute children population Zchildμk,q
VchildRBM_SVR (Dtrain, Dtest, Zchild)
end for
for each individual wolf kdo
update Zparentand Vparent
end for
end for
update the parameters in DBN SZparent.α
RBM_SVR (Dtrain, Dtest, S)
returnrtestZparent.α

Table 2.

The procedure of RBM_SVR _DEGWO algorithm.

5. A hybrid model based on RBM SVR DEGWO and XGBoost

5.1 XGBoost principle

5.1.1 Model function form

This paper assumes that the data set as the input sample is D=Xiziand the XGBoost model is a linear model (logistic regression, linear regression). The linear model is an additive model. The number of learning trees is n[34].The XGBoost model uses a pattern of linear superposition to calculate the predicted value, as shown in Eq. (12).

ẑi=nNhnxjE12

Here, xiis a feature vector and iis the number of data points. hnxiis the regression tree function. hnH, His the set space of the regression trees.

H=hnz=αfxE13

In Eq. (13), f:RmT, fXindicates that sample Xis classified on a leaf node. Trepresents the number of leaf nodes of the tree. αis the score of the leaf node. αfxrepresents the predicted value of the regression tree for the sample.

5.1.2 XGBoost learning objective function

The objective function based on the parameter space is shown in the following Eq. (14).

κϕ=Lϕ+Ωϕ=i=1Ilzjẑi+nNΩhnE14

where Ωϕis a regularization term, indicating a penalty value for the complexity of the model. The regular term Ωϕin the linear model includes: the regular term L1, Ωα=λα1, and the regular L2, Ωα=λα2. Lϕis an error function that measures the fitting accuracy of the model. A can reduce model bias, such as square loss, exponential loss. Compared to GBDT, XGBoost adds a regular term to the objective function. XGBoost punishes the complexity of each regression tree and avoids overfitting during learning. XGBoost measures the complexity of the tree such as the number of internal nodes, the depth of the tree, the number of leaf nodes T, the leaf node score α, etc. XGBoost uses the regular term as shown in Eq. (15).

Ωh=γT+12λα2E15

5.1.3 Model optimization

In the model parameter optimization process, each iteration model is always added a new function on the optimal model obtained from the previous training. After the kth iteration, the prediction of the model is equal to the prediction function of the first k1th model prediction function combined with the kth tree, as shown by Eq. (16).

ẑik=i=1Ihnxj=ẑik1+hixjE16

The objective function can be rewritten to Eq. (17).

κk=ẑik=i=1Ilziẑik1+hiXi+ΩhiE17

In formula (17), the model’s goal is to learn the function of the kth tree. When the error function is replaced by a second-order Taylor expansion, the objective function can be rewritten as Eq. (18).

Lk=i=1Ilziẑik1+vihiXi+12gihk2Xi+ΩhkE18

When vi=ẑik1lziẑik1a = 1 and gi=ẑik12lziẑik1, the objective function is Eq. (19).

L˜k=i=1IvihiXi+12gihk2Xi+ΩhkE19

This objective function solves regression, classification, and sorting problems. Eqs. (20) and (21) are in the form of a tree structure of the regression tree function and the regular term. The objective function can be updated to Eq. (22).

hX=αfxE20
Ωh=γT+12λα2E21
L˜k=i=1Iviαfxi+12gihk2αfx2+γT+λ12j=1Tαj2E22

This article defines the sample set on each leaf node as Jj=ifxi=j. The objective function based on the form of leaf node accumulation is Eq. (23).

L˜k=j=1TiGjfiαj+12iGjgi+λλαj2+γT=J=1Tδjαj+12ηj+λαj2+γTE23

This paper assumes that the structure of the tree is a certain value (i.e., fxiis determined). To solve the problem of minimizing the objective function, we can make the derivative of the objective function zero. The optimal predicted score for each leaf node is Eq. (24). The formula for the minimum loss function is Eq. (25), which can be thought of as a function that scores the tree structure. The tree structure is gradually optimized as the score is reduced.

Aj=δjηj+λE24
L˜=j=1Tδj2ηj+λ+γT=12j=1TiGjfi2/iGjgi+λ+γTE25

5.1.4 Structure score

Eq. (25) is a function for scoring a tree structure, called structure score. The smaller the score, the better the tree structure is. The algorithm searches for the optimal tree structure by using Eq. (25). L˜represents the contribution of the leaf node to the overall loss. The goal of the algorithm is to minimize the loss, so the larger part of δj2ηj+λcould be as good as possible. This article expands a leaf node and defines the gain as shown in Eq. (26).

gain=12δj2ηL+λ+δR2ηR+λδL+δRj2ηL+ηR+λγE26

In Eq. (26), δj2ηL+λis the score of the left subtree, δR2ηR+λis the score of the right subtree, δL+δRj2ηL+ηR+λis the score without division, and γis the cost of the complexity after introducing the new leaf node. The larger the value of gain, the more loss after splitting is reduced. Therefore, when segmenting a leaf node, we calculate the gaincorresponding to all candidate features and select the segment with the largest gain.

5.1.5 Best branch

The core part of the XGBoost algorithm is to obtain the optimal node based on the maximum gain obtained. XGBoost looks for the best branch using a greedy algorithm. The greedy algorithm traverses all possible segmentation points of all features, calculating the Gain value and selecting the maximum value to complete the segmentation. The greedy algorithm is an algorithm that controls the local optimum to achieve global optimization. The decision tree algorithm can also be considered as a method of greedy algorithm. XGBoost is an integrated model of the tree. If each leaf is optimal, the overall generated tree structure is optimal. This avoids enumerating all possible tree structures. XGBoost uses the objective function to measure the structure of the tree, and then let the tree grow from depth 0. Each time a branch calculation is implemented, XGBoost calculates the reduction in the objective function. When the reduction is below a certain value, the tree will stop growing.

5.2 Hybrid model based on RBM_SVR_DEGWO and XGBoost

After the boosting tree is created, the XGBoost algorithm extracts the importance score for each attribute. The XGBoost importance score measures the value of features in improving decision tree construction. The more an attribute is used to build a decision tree, the more important it is [35]. In order to further improve the accuracy of prediction and analyze the importance of feature quantity, this paper uses XGBoost to extract the feature quantity importance score. By combining the proposed RBM_SVR _DEGWO model prediction value, this paper proposes a hybrid prediction model, as shown in Table 3.

Algorithm 2. Hybrid Algorithm based on RBM_SVR_DEGWO and XGBoost
Input:D=Xizi, Jj=ifxi=j
data sets normalization
Initialize gain
Initialize δL, ηL
for each kdo
δL0, ηL0
forjin sorted (J, by Xjk) do
ηLηL+gj, δLδL+fj
ηRηηL, δRδδL
scoremaxscore
end for
end for
Split with max score
yd=RBM.SVR.DEGWOXi
ye=EXGBOOSTXi
yhybrid=Eydye
Output:yhybrid

Table 3.

Hybrid algorithm based on RBM_SVR_DEGWO and XGBoost.

6. Experimental description and result analysis

This paper analyzes 2,468,059 trajectory data from Mobike’s shared bikes. The data covers more than 300,000 users and 400,000 shared bikes. The data of each rental trip includes the start time, the end time, the Geohash code of the starting position, the Geohash code of the ending position, the bicycle ID and the user ID.

GeoHash is an algorithm for spatial indexing. In the GeoHash theory, the Earth is considered to be a two-dimensional plane that can be divided into multiple sub-regions. The latitude and longitude inside the sub-area will correspond to the same code. GeoHash-based spatial indexing can improve the efficiency of spatial data for latitude and longitude retrieval. In this paper, GeoHash encodes a square plane separated by a square of latitude and longitude of 0.001373. To improve the prediction accuracy, this paper combines nine adjacent areas into a square area with a length of 411.9873 meters. This paper divides Beijing into 10 × 10 traffic zones and numbers them from 1 to 100. Various indicators of the traffic area will be used as input data for the prediction model, as shown in Table 4.

NumberVariable name
1Zone number of origin traffic zone
2Zone number of destination traffic zone
3Longitude of the origin point
4Latitude of the origin point
5Longitude of the destination point
6Latitude of the destination point
7Distance between the center points of the traffic area
8Absolute value of the difference in the numbers of traffic zone
9Number of the day

Table 4.

Input variables and interpretation.

The output of the model is the daily transfer probability of traffic flow among the traffic zones pI,Jt, which is given by Eq. (27). In the cities of N interconnected traffic areas, pI,Jtindicates the transfer probability of the traffic flow with the original point Iand the destination Jin day t.

pI,Jt=dI,JtJ=1NdI,JtE27

where I=1,2,3,,N; J=1,2,3,,N; dI,Jtrefers to the traffic flow with the original point Iand the destination Jin day d. pI,Jtrepresents the origin–destination (OD) probability distribution and reflects the distribution of demand in the city.

This paper builds a set of destinations that may correspond to the origin traffic zone of the test day. The calculated destination candidates can be used to predict the probability of the traffic flow among the traffic zones. In the experiment, we selected data of different adjacent days as 6 test groups (Table 5).

The first day of training dataThe 2nd day of training dataPrediction data
Test groupDateData amount
(trajectories)
DateData amount
(trajectories)
dateData amount
(trajectories)
12017/5/10262,5692017/5/11272,2102017/5/12265,173
22017/5/13225,2812017/5/14236,5942017/5/15279,554
32017/5/14236,5942017/5/15279,5542017/5/16288,719
42017/5/15279,5542017/5/16288,7192017/5/17322,201
52017/5/16288,7192017/5/17322,2012017/5/18315,758
62017/5/10262,5692017/5/15279,5542017/5/16288,719

Table 5.

Experimental group training data and test data.

Based on data for the past 2 days as the training data, this paper predicts the subsequent third day of the transfer probabilities of bike-sharing traffic flow. Figure 1 is the root mean square errors of a prediction result of transfer probabilities of bike-sharing traffic flow in Beijing based on the RBM_SVR_DEGWO algorithm.

Figure 1.

The root-mean-square errors of the predicted transfer probabilities of bike sharing traffic flow.

Compared to the surrounding area, the central area of the city has higher shared bicycle usage and more bicycle trajectory data. Therefore, the Root Mean Square Error of the central region is smaller.

To illustrate the performance of the RBM_SVR_DEGWO algorithm, we calculated the predicted values of the SVR algorithm, the RBM_SVR algorithm, and the RBM_SVR_DEGWO algorithm based on the data from the experimental groups in Table 5. To ensure the fairness of the results, the data, network structure and parameter settings consistent. Figure 2 shows the mean-square error bars of the predicted transfer probabilities of SVR, RBM_SVR, and RBM_SVR_DEGWO.

Figure 2.

The mean-square error bars of the predicted transfer probabilities of bike sharing traffic flow of six test groups for each type of comparison method.

The average values of the mean squared errors of the predicted values of the transfer probabilities of the algorithms SVR, RBM_SVR, and RBM_SVR_DEGWO are gradually reduced. The average mean square error of the SVR is 0.0916, the RBM_SVR is 0.0542, and the RBM_SVR_DEGWO is 0.0283. RBM improves the prediction accuracy of the model through the deep network structure. The DEGWO algorithm stabilizes the prediction value error to a lower value by optimizing the parameters of the RBM-SVR. Compared with SVR and RBM_SVR, RBM_SVR_DEGWO algorithm has better robustness.

According to the proposed hybrid algorithm of RBM_SVR_DEGWO and XGBoost, the value of transfer probabilities of bike-sharing traffic flow can be predicted. The data set for this experiment is from the grouped data of Table 5. The training data set, test data set, and feature variables are the same as those used in the previous experiments. Table 6 is the parameters and explanation of the XGBoost model.

ParameterValueInterpretation
early_stopping_rounds200If the loss function does not decrease after the model has been added to n trees continuously, the model will stop adding trees.
eval_metriclinearEvaluation index
eta0.3The shrinking step size used in the update process. Eta is used to prevent overfitting.
min_child_weight1Min_child_weight refers to the sum of the weights of the smallest leaf nodes. If the sample weight of a leaf node is less than min_child_weight then the splitting process ends.
max_depth6Maximum depth of the tree
lambda1Penalty factor for L2 regular terms
alpha0Penalty factor for L1 regular terms
objectivelinearLoss function

Table 6.

Parameters and explanations of the XGBoost model.

The root mean square error of the predicted values of the RBM_SVR_DEGWO algorithm, the XGBoost algorithm, and the hybrid algorithm is shown in Figure 3. In the six experimental groups, the mean, variance, kurtosis, maximum, minimum, and range of the predicted root mean square error of the RBM_SVR_DEGWO algorithm, the XGBoost algorithm, and the hybrid algorithm are shown in Figure 3.

Figure 3.

The root mean square error of the predicted values of the RBM_SVR_DEGWO algorithm, the XGBoost algorithm, and the hybrid algorithm.

The statistical characteristics of the proposed root mean square error of the algorithms are shown in Figure 4. The root-mean-square error of the predicted value of the mixed algorithm has a high kurtosis value. It indicates that the variance increases of root mean square error is caused by the extreme difference of low frequency greater than or less than the mean value. The plots of the minimum and variance indicate that RBM_SVR_DEGWO can achieve higher prediction accuracy than XGBoost. XGBoost is more stable than RBM_SVR_DEGWO in the prediction process. In the six experimental groups, compared with the RBM_SVR_DEGWO algorithm and the XGBoost algorithm, the mean, variance, maximum, minimum, and range of the root mean square error of the predicted value of the hybrid algorithm are lower. Therefore, by combining the prediction results of the RBM_SVR_DEGWO algorithm and the XGBoost algorithm, the hybrid algorithm improves the prediction accuracy and obtains a lower root mean square error of the predicted value. XGBoost scores the importance of each feature based on the number of times the feature is used to segment the sample in all trees and the average gain in all trees. In the six experimental groups, the ranking of each input feature variable is as shown in Figure 5.

Figure 4.

Statistical characteristics of the root mean square error.

Figure 5.

Importance analysis of characteristic variables.

The main factors affecting the transfer probabilities of bike-sharing traffic flow are the destination traffic zone number, the origin traffic zone number, and the absolute value of the difference between the numbers of traffic zone. It shows that the shared bike rider’s choice of destination is usually affected by the starting point, the end position and the distance of the journey. The shared bicycle service is suitable for short trips. Travel destinations for shared bike riders are usually nearby business and lifestyle centers, and bus stops. The dates of the data for the six groups of experiments are within weekdays. On a normal weekday, for the riders in the same community, the main travel destinations are somewhat similar and fixed. Therefore, information such as the cell number of the origin and destination becomes a key factor for predicting the probability of travel destination.

7. Conclusions

The principal objective of this study is to predict the traffic flow transfer probability of shared bicycle by proposing a hybrid deep learning algorithm and accurately reflect the transfer probability of the user’s OD demand. First, this paper constructs a deep-structured RBM model and connects it to the SVR model for predicting continuous probability values. Furthermore, we utilize the DEGWO optimization algorithm, named, to optimize the parameters csand γsin the stacked RBM-SVR algorithm. XGBoost improves the prediction accuracy and analyzes the importance of the feature variables in the input data.

Based on the comparison results, it demonstrates that the proposed hybrid algorithm outperformed the XGBoost model and RBM_SVR_DEGWO model. The XGBoost algorithm improves the stability of the prediction process and reduces the error of the RBM_SVR_DEGWO algorithm at extreme points. The deep-structured RBM algorithm simulates the probability distribution that best produces the training samples. In the case of massive training data, RBM improves the efficiency of algorithm calculation utilizing Gibbs sampling of small-batch data. In the DEGWO algorithm, the GWO algorithm guarantees the global search capability, and the DE algorithm avoids the fall into a local optimal through the mutant individual, crossover, and selection operations.

Download

chapter PDF

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Wenwen Tu (October 29th 2019). Deep Learning Based Prediction of Transfer Probability of Shared Bikes Data [Online First], IntechOpen, DOI: 10.5772/intechopen.89835. Available from:

chapter statistics

36total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us