Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.
We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.
In this chapter, we propose a probabilistic model for train delay propagation. There are deduced formulas for the probability distributions of arrival headways and knock-on delays depending on distributions of the primary delay duration and the departure headways. We prove some key mathematical statements. The obtained formulas allow to predict the frequency of train arrival delays and to determine the optimal traffic adjustments. Several important special cases of initial probability distributions are considered. Results of the theoretical analysis are verified by comparison with statistical data on the train traffic at the Russian railways.
train delay propagation
chapter and author info
CC FEB RAS, Russia
CC FEB RAS, Russia
*Address all correspondence to: firstname.lastname@example.org
The trains’ movement is subject to a variety of random factors which leads to unplanned delays. This causes the scattering of the arrival times, hence, the inconvenience to passengers and consignees. Knowledge of the arrival times’ distribution properties leads to the possibility of predicting the characteristics of the train traffic and making correct decisions on the transportation process management. This makes it possible to improve the punctuality of train traffic and save resources, in particular, electric power.
The properties of the arrival headways distributions allow us to estimate the probability of delays emergence and theirs characteristics, which are important from a practical point of view. Probabilistic modeling of the delay propagation process along the train flow is the main tool for solving this problem.
The models for the distribution of delays in a dense train flow are divided into two classes. These are deterministic and stochastic models. Stochastic models take into account the unpredictable nature of obstacles in the railway. A mathematical model, proposed in the present chapter, make it possible to determine the probability distributions of the arrival headways of two consecutive trains at the station. The distribution properties are analyzed for different scattering of input random variables (the primary delay and the initial headways). Comparison of theoretical distributions with real statistics of train traffic on the Russian railways is performed.
A substantial volume of literature is devoted to study of the train delays effect on the railway functioning. Deterministic models for primary and knock-on delays description were proposed in [1, 2]. These models based on the application of graph theory allow adjust the train traffic schedule. However, such approach considering the different characteristics of train traffic (e.g., travel and dwell times, headways, etc.) as deterministic values does not take into account the uncertainties that arise in reality.
Stochastic modeling takes the influence of random factors (e.g., see [3, 4, 5, 6, 7, 8]) into account. Authors of  determine a probabilistic distribution of the arrival times. The problem of finding a distribution of arrival train delays is examined in . It should be noted that in these papers, special cases of primary delay distribution are considered. It is supposed in  that the random duration of the primary delay corresponds to some generalization of the exponential law. The paper  employs discretization of the delay distribution.
Some of the researchers have analyzed statistical data on deviations of the train arrival times from the planned ones. In particular, the papers [9, 10, 11] show that scattering of these deviations correspond to the exponential distribution.
3. Description of models and analysis of the arrival headways distribution
3.1. The first model
Trains follow one path one after another in one direction from station Ato station Bwith the same average speed . Let the total number of trains is n. The distance from the train jto the train (j− 1) is denoted by , where j = 2, 3, …, n, is the minimal safe distance between trains, and , , …, are the random variables (without any assumptions about their distributions). All trains have the same destination station.
Let us also introduce the notations: , . Suppose that train 1 departs from station Aat the time . Then, the moment of departure train mcan be found as (as shown at Figure 1):
Assume that at some point in time, train 1 makes unplanned stop. The duration of this stop is random value . The subsequent trains suffer knock-on delays, when the value is large enough. Following train stops when the distance to the front train is reduced to . It is assumed that as soon as the front train restore running, then the next one immediately follows it. The following problem is considered: to find out the probability distribution of the random arrival headway between the trains (k− 1) and kat the destination B(denote this headway as ), assume that only the first train makes an unplanned stop. In other words, we need to find the (cumulative) distribution functions , k = 2, 3, …, n. Call this problem by the first problem.
3.2. The second model
Suppose that train 1 was delayed at station Aat the moment and waited for a random time . If , then trains 2, 3, and so on, depart at the planned times: , , etc. If , then train 2 will be delayed and will depart at the time Train 3 departs according to the same rule depending on the delay time of train 2, and so on. In this formulation, is actual departure headway between the trains with numbers (k− 1) and k. It is required to determine the distribution functions of random variables , k = 2, 3, …, n.
Example 1. Let n = 5, , , . The moments of planned departures of trains satisfy the equalities , . Figure 2 shows the process of headways forming, depending on the six values of the interval . The dots represent real train departure times that result from the primary delay .
The basic model assumptions are follows: (1) only train 1 is exposed to primary delay . (2) , k = 2, 3, …, n.
Denote by the real departure time of the train with number k, which depends on and .
We suppose that the departure times of trains satisfy the following two rules. Let kbe fixed, . The first rule: if , then . The second rule: if , then . Obviously, .
In what follows, we use the notation where Ais an arbitrary set on the real line R.
Suppose that the total number of trains is equal to . Formally, we set if . Let us proceed to the formulation of the obtained results. We note that the proofs of the majority of the assertions are not given here due to the condition on the size. They take up a lot of space and will be published in our other work.
Theorem 1. 1. If, then,,.
2. Let k be a fixed integer,. If , then .
3. If , then
Theorem 2. Let. For any k,, the following formula holds
Let us introduce the notations, , . Note that . We denote by the density function of in the case when it is absolutely continuous.
Further, some corollaries of Theorem 2 are formulated.
Corollary 1. Let,, be arbitrary positive numbers, then for
Example 2. Let the primary delay have exponential distribution, that is,
As initial parameters, we take the following quantities.
It should be noted that in this and the subsequent examples, we use the following measures for the values: , , , , , , , , (minutes, min); (1/min); (min2). The product (as mean of ), where is a shape parameter, is a scale parameter (in min).
Corollary 2. Let,, be a positive constant, then for
Example 3. Let has density (Eq. (8)). As initial parameters, we take the following quantities:
Figures 3 and 4 show that in the case of constant , the primary delay practically does not affect the fourth train and all subsequent ones. This is consistent with the equality which, as it is not difficult to verify, follows from Eq. (10).
Remark 1. It is known that the distribution of sum of the independent random variables is the convolution of their distributions. The convolution of distribution functions and is determined by the formula , where the integral sign means the improper Riemann-Stieltjes integral. We consider exceptionally piecewise-continuous distribution functions, then the indicated integral exists with the exception of the case when and have at least one common discontinuity point (e.g., ). The convolution operation is permutable. In the case, when , we shall use the following notations: , , . By definition, we assume that . The convolution of densities and is defined as the improper Riemann integral .
Corollary 3. Let,, be independent identically distributed random variables with a continuous distribution function. Letbe independent of,. Then
Corollary 4. Let,, be independent identically distributed random variables with a density function. Letbe independent of alland has a density function. Then
Remark 2. The integration limit “” can be replaced by 0 in Corollaries 3 and 4 if . On the other hand, we may consider in these corollaries the case when takes values of different signs. From a practical point of view, such an approach is acceptable if the probability that these random quantities take negative values is small enough. This assumption allows to consider, for example, models in which the random variables are normally distributed with a variance small enough and to use the property that the class of normal distributions is closed with respect to the convolution operation.
Example 4. Let has the density (Eq. (8)), and all have the same gamma density
where , is gamma function. Put
One can show that in the example under consideration it follows from Eqs. (15) and (16) that
where is incomplete gamma function. Graphs of the distribution functions with the parameters (Eq. (18)) are depicted in Figure 5.
It is not difficult to verify that for from Example 4 the following formula holds:
It can be seen from Figure 5, curves and so on are practically merged. Hence, in the case under consideration, one can draw the following conclusion: primary delay affects to fifth and all successive trains approximately like on the fourth one.
Remark 3. We define the 0-fold convolution as a generalized function with the following property: the equality holds for any bounded continuous function . Then, Eq. (16) for coincides with Eq. (15).
We do not give proofs for the statements of Section 3 because of limitations on the volume. We will make this in another work.
Denote by Nthe random number of knock-on delays (within the framework of the model under consideration).
Lemma 1. For each fixed integer m,,
Proof. Easily seen:
m = 1, 2, …, n– 2, This implies that
Here and below, the sign □ denotes the end of the proof.
The corollaries of this lemma are given below. Their proofs are simple and therefore we do not present them.
Corollary 5. Ifis a constant value,, then for every fixed integer m,, we have the equality.
Corollary 6. Ifis a constant value,, andis exponentially distributed with parameter, then for every fixed integer m,, the following equality holds,
Corollary 7. If , …,are independent identically distributed random variables with a density function, then for every fixed integer m,, we have the equality
In what follows, is the delay duration of the first train, , k = 2, …, n, is the knock-on delay of the k-th train. The problem is to find the distribution functions , k = 2, 3, …, n. Note that the solution of this problem, which we call by the second problem, allows us to find the distribution of the deviations of the real arrival times from the planned ones.
In what follows, we will use the notation instead of .
Theorem 3. The following formula holds:
Corollary 8. The following formula holds:
It should be noted that within the framework of our model the deviation of the real arrival time from the planned one for k-th train coincides with , . Figure 6 illustrates this statement.
The dotted lines (lines and ) represent the scheduled trajectories of trains 1 and 2, solid lines (1 and 2) depict the real trajectories taking into account the delays. It can be seen that the arrival time of the train 1 differs from the schedule at and the train 2 on the .
Denote , . As it follows from the assumption that the random variables have the same distribution function . They are mutually independent. The random variable has the distribution function .
Corollary 9. The distribution function ofhas the following form:
The next Corollaries 10 and 11 follow from Corollary 9 in an obvious way.
Corollary 10. Let ,be some constant values. Then
Corollary 11. Let,be a constant value. Then
Corollary 12. Let,be independent identically distributed random variables with a continuous distribution function. Letbe independent of. Then
Corollary 13. Let,be independent identically distributed random variables with a density function. Letbe independent of,and has a density function. Then.
Proof. Let be the time spent by the train on the path length (distance to the place, where an unplanned stop of the train 1 occurred). We show the equality holds under the condition . The departure time of the train 1 after stopping is . The time point when train 2 reaches can be written as . The knock-on delay of train 2 will not occur, i.e., , in the case, when the indicated time points are separated by the value , i.e., , or, which is the same thing, . The considered case is illustrated in Figure 7a.
The knock-on delay of the duration will occur when . Indeed, since trains after a random stop depart simultaneously, then the equality holds, i.e., . The case under consideration is illustrated in Figure 7b. Thus, the validity of Eq. (26) is shown. □
Proof of Theorem 3. We shall use the method of mathematical induction. The equality (Eq. (20)) for k = 2 is established by Lemma 2. Let Eq. (20) be satisfied. We show that:
It follows from the inductive hypothesis that under the condition . But if the delay of the k-th train is 0, then the next train does not undergo any delay, that is, . The present case is illustrated in Figure 8.
In the case, when , a knock-on delay of the k-th train occurs and equals to (according to the inductive hypothesis). Further, two cases are possible: either (1) a delay entails a delay , or (2) .
Case 1. If the k-th train is delayed, then (k + 1)-th one will be delayed only if , and its delay duration is (this fact follows from the equality of the moments of departure of the k-th and (k + 1)-th trains after an unscheduled stop: ). Case 1 is illustrated in Figure 9a.
Case 2. If the k-th train is delayed, then (k + 1)-th one will not be delayed () only if . Case 2 is illustrated in Figure 9b. Note that if the knock-on delay of the k-th train occurs, a conflict of the k-th train with (k + 1)-th is described similar to the interaction of trains 1 and 2 (see Lemma 2). All described cases lead to Eq. (20).□
Proof of Corollary 8. We indicate that Eq. (21) is similar to Eq. (20). According to the statement of Theorem 3, we have:
Using the method of mathematical induction and taking into account that , we obtain Eq. (21) from Eq. (28).□
Proof of Corollary 9.It follows from Corollary 8 that if (see, e.g., Figure 8), and if (see, e.g., Figure 9a). Using the law of total probability, we obtain the following chain of equalities:
Proof of Corollary 12.Apply the well-known assertion to Eq. (22): if and are independent random variables, then for any function of two variables and any , the following equality holds: , where is the distribution function of . Consequently, . This implies Eq. (25).□
Proof of Corollary 13.The assertion follows from Eq. (25).□
Note that the function has a jump at zero which is equal to:
, where .
In the case, when and are absolutely continuous, it follows from Eq. (25) that
where and are the density functions of and , respectively, is the j-fold convolution of the density . In this case, we also have
If we assume that , then we deduce from Eq. (29) that
6. Corollary of Theorem 2 when the distribution of primary delay is a mixture of exponential and one-point distributions
Consider the cumulative distribution function of the following type:
where , , and are some parameters. Such distribution function is considered, for example, in . It is easy to see that , where is the distribution function of the degenerate distribution concentrated at the point , .
Let us find out the form of the distribution functions (Eqs. (13) and (14)) in the case of Eq. (33), when the function is continuous. In what follows, we mean that .
Lemma 3. Let the function G be defined byEq. (33), andbe continuous. Then
Proof. According to Eq. (33), one may conclude that function has a unique discontinuity point . Hence, the integral exists for any continuous distribution function . Note that if had a discontinuity point , then the function would also be discontinuous at the point for , and then the considered integral would not exist (see Remark 1). Since
Remark 4. It can be easily seen that the larger k, from Eq. (43) is closer to . This agrees with Figure 10 and the formulas (44) and (45) due to which we have , as , and also with the results of calculations in Table 1.
Let the random variable be distributed with the density (Eq. (33)) with parameters , . Now, we find the condition on the parameter T, under which the probability that at least mof knock-on delays will occur would not exceed a given probability p. Note that the departure headway is equal to .
According to Corollary 6, it is necessary to solve the inequality . As a result, we obtain the desired condition:
(see also ). Denote by the minimal Tsatisfying the inequality (Eq. (47)).
Example 6.Let us fix . The behavior of as a function of the continuous parameter mwith and is shown in Figure 11a. Obviously, is the decreasing function with respect to the argument p. Exact calculations can be made using the formula:
Let . The behavior of as a function of the continuous parameter mwith and is shown in Figure 11b. In accordance with Eq. (48), is the decreasing function with respect to the argument . In the case of exponential density , we have . Therefore, the decrease of leads to increase in the average of primary delay and the departure headways (if we want to reduce the number of knock-on delays).
We also obtain the corollaries of Lemma 3 in the case when are distributed according to the gamma-law with the density (Eq. (17)).
Corollary 15. If primary delayhas an exponential distributionand,, has the density (Eq. (17)), then the following formulas are true:
Remark 5. The function is not a density, in particular, because of , where is the jump of the function at the origin. At the same time, the function is a density.
Corollary 15 can be reformulated as follows.
Corollary 15*. Let primary delayis exponentially distributed with a parameter, and,, have the same gamma distribution with the density (Eq. (17)). Then,has the distribution function of the formEq. (33)with, b = 0, and, consequently,
Remark 6. Let , Then by Corollary 15*, . Hence, as .
Example 7.Let be independent random variables having the same density function (Eq. (17)). We perform three series of experiments and investigate a behavior of distribution of the arrival time deviations with various combinations of parameters: , , k. The results are presented in graphical form in Figures 12–15 The functions are calculated by formula (49), and the functions by formula (50). Note that product is the mean of . Parameter is equal to 0.25 and as it observes in reality.
7. Comparison with statistics of real train traffic
Let us consider the following random variable: the deviation of the real moment of arrival at a certain station from the scheduled one. Denote it by . Statistical analysis of data on this random variable, received from the Russian railways, has led to the conclusion that in many cases, they obey the modified exponential law with the distribution function of the form Eq. (33) with Using data on the suburban trains of the direction “Moscow-Tver” for the period: January, 11–15, February, 1–6, 2016, we obtained a sample from the distribution of of the size with the sample mean 1.44 and sample variance 2.7. We tested the hypothesis that obeys distribution (Eq. (33)) with and . To this end, we applied the Kolmogorov goodness-of-fit test with the significance level and obtained the fit between the hypothesis and the sample data (see Figure 16).
Remark 7. It should be noted that in considered example the deviation is nonnegative. But in reality, it can frequently be both positive and negative. Positive values are due to arisen delay. Negative values occur due to the fact that sometimes early arrivals take place.
Remark 8. Although the hypothetical distribution function from Figure 16 is constructed for deviations without any details about the train number k, it is well correlated with the graph of the function with from Figure 12.
This allows us to assume that the distribution of the deviation is mainly determined by the distribution of the delay .
Remark 9. It was verified that if the length of the random variables have the same gamma distribution, any variation of the parameters of this distribution (and ) has a rather small influence on behavior of output distribution (see Figures 12–15).
Remark 10. Since the primary delay has a great influence on formation of the output distribution of deviations from the schedule (), then a knowledge of the primary delay distribution in each particular situation allows to predict the distribution of knock-on delays.
One important practical effect of the considered model is that it enables us to estimate the standard deviation (SD) of the actual arrival delays at the destination station. As an example, we calculated this parameter for the suburban railway line. The data analyzed were collected at the Tver station in the period of January 2016 and February 2016.
Example 8.Due to statistical data, we can consider that has the exponential distribution with the parameter (i.e., has the distribution function (Eq. (33)) with , , ), and has gamma distribution with the density function (Eq. (17)), where , . Using formulas (49) and (50) with , we have:
Thus, theoretical . This corresponds with the real statistics which shows the SD amount is 3.32 min for the mentioned station.
The mathematical model of train traffic proposed in the chapter allows us to find conditions on initial headways, which provide a smallness of frequency of a large number of delays. In other words, the formulas for the distributions of arrival headways obtained in the chapter enable to optimize the frequency of arriving train delays.
Vladimir Chebotarev, Boris Davydov and Kseniya Kablukova (September 26th 2018). Probabilistic Model of Delay Propagation along the Train Flow, Probabilistic Modeling in System Engineering, Andrey Kostogryzov, IntechOpen, DOI: 10.5772/intechopen.75494. Available from:
Over 21,000 IntechOpen readers like this topic
Help us write another book on this subject and reach those readers
Probabilistic Methods for Cognitive Solving of Some Problems in Artificial Intelligence Systems
By Andrey Kostogryzov and Victor Korolev
We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.