Accident Prediction Modeling Approaches for European Railway Level Crossing Safety

Ci Liang; Mohamed Ghazel

doi:10.5772/intechopen.109865

Abstract

Safety is a core concern in the railway operation. Particularly, in Europe, level crossing (LX) safety is one of the most critical issues for railways. LX accidents often lead to fatalities and weighted injuries and seriously hamper railway safety reputation. Moreover, according to statistics, collisions between trains and motorized vehicles contribute most to LX accidents. With this in mind, we will elaborate on accident prediction modeling for train-vehicle collisions at LXs in this chapter. The methods and findings discussed in this chapter will offer an in-depth insight for interpreting significant aspects underlying collision occurrence and facilitate identifying technical countermeasures to improve LX safety.

Keywords

level crossing safety
train-vehicle collisions
accident prediction modeling
nonlinear least-squares method
negative binomial regression method
Poisson regression method
zero-inflated Poisson regression method
zero-inflated negative binomial regression method
model performance evaluation

Author Information

Show +

Ci Liang*
- Harbin Institute of Technology, Harbin, China
- Université Gustave Eiffel (ex-IFSTTAR) - COSYS/ESTAS, Villeneuve-d’Ascq, France
Mohamed Ghazel
- Université Gustave Eiffel (ex-IFSTTAR) - COSYS/ESTAS, Villeneuve-d’Ascq, France

*Address all correspondence to: ciliang.lc@gmail.com

1. Introduction

The level crossing (LX) is railway property upon which road users are given permission to cross [1]. Accidents at LXs give rise to serious material and human damage, and the majority of accidents are caused by vehicle driver violations. As demonstrated by accident statistics, LX safety is one of the most critical issues that railway stakeholders need to deal with [2, 3]. In 2012, there were more than 118,000 LXs in the 28 countries of the European Union (E.U.) [4]. In some E.U. countries, LX accidents account for up to 50% of railway accidents [5]. In the UK, LXs account for 11.8 fatalities and weighted injuries on average per year, comprising 8.4% of the total system risk for the railway network [6]. There were 49 collisions between road vehicles and trains at LXs in Australia in 2011 [7]. In France, the railway network incorporates more than 18,000 LXs for 30,000 km of railway lines and around 13,000 LXs show heavy road and railway traffic [8]. In 2016, 111 train-vehicle collisions at French LXs led to 31 deaths [9]. This number was half the total number of collisions per year at LXs a decade ago, but still too large [10]. Due to nondeterministic causes, complex operation background, and the lack of thorough statistical analysis based on detailed accident/incident data, the risk assessment of LXs remains a challenging task. Therefore, there is a pressing need for a series of thorough analyses to understand the potential reasons for these accidents and to identify practical countermeasures to prevent accidents at LXs, thus significantly reducing the LX accidents.

In recent years, the Poisson regression model, negative binomial (NB) regression model, and other variants of the Poisson regression model [11, 12] have gained popularity to deal with risk/accident statistics. Ref. [13] adopted the expressions of the estimated expectation value λ̂ as shown in Eq. (1) corresponding to the Poisson regression and NB regression models, respectively. Ref. [14] employed the variants of Poisson regression model, namely, the zero-inflated Poisson (ZIP) model and the hurdle Poisson model, to deal with LX accident prediction involving the data in North Dakota. Ref. [15] compared the zero-inflated negative binomial (ZINB) model with the USDOT model [16] by using the LX accident data from Illinois, in terms of accident prediction accuracy. The results of this study show that the ZINB model has higher accuracy of prediction. It is worth noticing that the expressions of estimated λ̂ as shown in Eq. (1) are not appropriate in our current study, since they are limited to handling zero observations and some impacting variables should not be in the exponential form. Ref. [17] developed another model of λ̂ as shown in Eq. (2). In this model, the product of the average daily road traffic V and the average daily railway traffic T (known as the conventional traffic moment) is adopted. However, using the conventional traffic moment hinders improving the accuracy of the prediction model:

λ̂Poi=exp∑j=1mβ0+βjxj,λ̂NB=exp∑j=1mβ0+βjxj+ε,E1

where β is the estimated regression coefficient, x is the impacting variable, and ε is the gamma-distributed error in NB regression model:

λ̂=V×Tβ1exp∑j=1mβjxj+σ,E2

where σ=β0 in Poisson regression model or σ=β0+ε in NB regression model.

Based on these investigations, it is clear that there is a pressing need for an appropriate accident prediction model that should comprehensively consider contributing factors toward LX safety. Moreover, such a model should have high predictive accuracy. Therefore, in the present study, a new accident prediction model is developed to predict the accident frequency at LXs. Specifically, we focus on the SAL2 type of LX (i.e., an automated LX system with two half barriers and flashing lights), which is the most widely used type of LX in France and contributed most to the total number of accidents at French LXs from 1974 to 2014.

2. Method

In this section, an advanced accident prediction model is developed, which enables to rank risky LXs accurately and identify the significant impacting parameters efficiently. The model considers the average daily road traffic, the average daily railway traffic, the annual road accidents, the vertical road profile, the horizontal road alignment, the road width, the crossing length, the railway speed limit, and the geographic region. The nonlinear least-squares (NLS) method, Poisson regression method, NB regression method, ZIP regression method, and ZINB regression method are employed to estimate the respective coefficients of parameters in the prediction model.

2.1 Data sources and coding

The dataset used in our study, which cover SAL2 LXs in 21 administrative regions in mainland France from 2004 to 2013, has been provided by SNCF Réseau (the French national railway infrastructure manager). Moreover, the dataset includes 10 years of information about annual LX accident frequency, annual roadway accident statistics and railway, roadway, and LX characteristics. In total, there are 8332 public SAL2 LXs involved in our investigation. The impacting parameters relevant to LX accidents considered in our investigation can fulfill the following characteristics: (1) important in determining accident frequency, (2) more permanent in nature (e.g., sight obstruction noted as a problematic factor due to involved alterable construction topography, vegetation, and other environmental elements), and (3) not accident-dependent [18]. The statistical characterization of parameters considered in this investigation are shown in Table 1. It is worth noticing that the road accident factor is reflected by the ratio of the annual number of road accidents in a given year to the average number of road accidents per year over the period of 10 years considered, while the region risk factor is reflected by the general accident frequency per SAL2 in the corresponding region. Overall, the data coding is shown in Table 2.

Parameter	Description	Mean	Std. dev.
Railway traffic characteristics
Average daily railway traffic	The average number of trains crossing the LX daily;	26.1	30.2
Railway speed limit	The maximum permission speed of train within the LX section;	92.5	42.4
Roadway traffic characteristics
Average daily road traffic	The average number of road vehicles crossing the LX daily;	826.8	1.8e+03
Annual road accidents	The number of road accidents in a given year;	7.1e+04	9.7e+03
LX characteristics
Alignment	Horizontal road alignment shape: “straight”, “curve,” or “S”;	N/A	N/A
Profile	Vertical road profile shape: “normal”, “hump,” or cavity”;	N/A	N/A
Length	The entering road width;	9.7	3.9
Width	The distance that road vehicles need to cross through the LX;	5.5	1.4
Region	The region of the LX considered;	N/A	N/A

Table 1.

Statistical characterization of parameters considered.

Parameter	Data coding
Railway traffic characteristics
Average daily railway traffic	Numerical, used directly;
Railway speed limit	Numerical, used directly;
Roadway traffic characteristics
Average daily road traffic	Numerical, used directly;
Annual road accidents	Road accident factor: Annual road accidents in a given year/Average road accidents per year over the period observed;
LX characteristics
Alignment	Alignment indicator: 0, 1, and 2 represent “straight”, “curve,” and “S,” respectively;
Profile	Profile indicator: 0 and 1 represent “normal” and “hump or cavity,” respectively;
LX width	Numerical, used directly;
Crossing length	Numerical, used directly;
Region	Region risk factor, highlighting the general LX-accident-prone region: The number of SAL2 accidents over the observation period in the region considered/The number of SAL2 LXs in the region considered;

Table 2.

Parameters considered and data coding.

2.2 Advanced accident prediction model

Here, we define that the formula of the conventional traffic moment is given as: Traffic moment = Road traffic frequency × Railway traffic frequency [19]. However, based on some previous analyses [20], we adopt a variant called “corrected moment,” or CM for short. CM=Va×Tb, where a+b=1 and the optimal value of a in terms of fitting is calculated to be a=0.354 according to the statistical analysis performed by SNCF Réseau [21]. Therefore, we consider V0.354×T0.646 as an integrated parameter that reflects the combined exposure frequency of both railway and road traffic.

The developed advanced model takes into account various variables as interpreted in Table 2. The general form of the model is shown as follows:

λ10Y=K×FRAcc×Va×Tb×exp(CProfile×IProfile+CAlign×IAlign+CWid×Wid+CLeng×Leng+CRSL×RSL+CReg×FReg),E3

where λ10Y represents the annual accident frequency at a given SAL2 for a period of 10 years; FRAcc is the road accident factor, which is a time-dependent variable and reflects the variation of annual road accidents as time advances; K is the coefficient of FRAcc; V denotes the average daily road traffic; T denotes the average daily railway traffic; IProfile is the profile indicator and CProfile is the coefficient of IProfile; IAlign is the alignment indicator and CAlign is the coefficient of IAlign; Wid is the LX width and CWid is the coefficient of Wid; Leng is the crossing length and CLeng is the coefficient of Leng; RSL is the railway speed limit and CRSL is the coefficient of RSL; FReg is the region factor and CReg is the coefficient of FReg. Note that this model does not only rank risky LXs accurately but also allow for identifying significant parameters efficiently.

2.2.1 Regression approaches

In this section, several regression approaches are adopted to estimate the coefficients associated with the parameters of our model. The nonlinear least-squares (NLS) technique and Gauss-Newton algorithm [22] are firstly considered to estimate the variable coefficients in our model. Considering a fitting model function y=fxβ, where variable x depends on a vector of l parameters: β=β1β2…βl. The goal is to find the vector β which can let the model function fit best the actual observed data in the least-squares sense. In other words, minimize the sum of residual squares S expressed as follows:

S=∑i=1mri2,m≥l,E4

where ri is the residual between the fitting model estimation and the actual observation, ri=yi−fxiβ.

The minimum value of S is obtained by solving the gradient function ∂S/∂βj=0, i.e.,

∂S/∂βj=2∑iri∂ri/∂βj=0,βj≈βjk+1=βjk+Δβj,E5

where k is the iteration number and Δβj is the shift parameter.

At each iteration step, the model is linearized by approximation to the first-order Taylor series expansion about βk:

fxiβ≈fxiβk+∑j=1lβj−βjk∂fxiβk/∂βj≈fxiβk+∑j=1lJijΔβj,E6

where Jij is the element of Jacobian matrix J and ∂ri/∂βj=−Jij.

Therefore, ri can be rewritten as:

ri=Δyi−∑s=1lJisΔβs,Δyi=yi−fxiβk.E7

By substituting the above expressions into the gradient equation in Eq. (5), we obtain the normal equation and its matrix notation:

∑i=1m∑s=1lJijJisΔβs=∑i=1mJijΔyi,JTJΔβ=JTΔy.E8

For an NLS model, S should be modified as follows:

S=∑i=1mWiiri2,m≥l.E9

Therefore, the matrix notation of normal equation for an NLS model is expressed as follows:

JTWJΔβ=JTWΔy.E10

These aforementioned equations form the basis of the Gauss-Newton algorithm for solving an NLS problem.

In fact, the Poisson regression model shown as Eq. (11) is a natural choice for modeling accident occurrence:

PoiX=k=λke−λk!,k=0,1,2,…,E11

where PoiX=k is the probability of k accidents occurring, k∈N, and λ is the expectation value of the number of accidents.

However, [23] indicates that accident frequency is likely to be over-dispersed (see Eq. (12)) and suggests using the negative binomial (NB) regression model as an alternative to the Poisson model:

VARX=EX>EX,for over‐dispersed<EX,for under‐dispersed.E12

The NB model as a special case of Poisson-Gamma mixture model is a variant of the Poisson model designed to deal with over-dispersed data [11, 24, 25]. The over-dispersion could come from several possible sources, e.g., omitted variables, uncertainty in exposure data, covariates, or nonhomogeneous LX environment [26]. The NB model considered in this study has the following expression:

PNBX=k=Γk+1αΓk+1Γ1α11+αλ1/ααλ1+αλk,k=0,1,2,…,E13

where PNBX is the probability of k accidents occurring, k∈N, α is the dispersion parameter, and λ is the expectation of the number of accidents.

The relationship between the mean value and the variance in the NB model is given as follows:

VARX=αEX2+EX,E14

if α<0, there is an under-dispersion; if α>0, there is an over-dispersion; in the case where α=0, the NB model reduces to the Poisson model.

In practice, the count data may contain extra zeros relative to the Poisson or NB distribution. In this case, the ZIP or ZINB regression model is useful for analyzing such data [27]. The ZIP model is expressed as follows:

PZIPX=k=ω+1−ωexp−λ,fork=01−ωexp−λλk/k!,fork>0,E15

where PZIPX=k is the probability of k accidents occurring, k∈N, λ is the expectation value of the number of accidents, and logω1−ω=z′γ is the ZI link function that z′ is the ZI covariate and γ is the corresponding ZI coefficient. The mean value and variance of ZIP model are EX=1−ωλ and VARX=1−ωλ1+ωλ.

The ZINB model is expressed as follows:

PZINBX=k=ω+1−ω1+αλ−1/α,fork=01−ωΓk+1αΓk+1Γ1α11+αλ1/ααλ1+αλk,fork>0,E16

where PZINBX=k is the probability of k accidents occurring, k∈N and λ is the expectation value of the number of accidents. The mean value and variance of ZINB model are EX=1−ωλ and VARX=1−ωλ1+ωλ+αλ. The ZINB reduces to the ZIP in the limit α→0.

However, the NB and ZINB models are limited to handling under-dispersed data (α<0) [11]. That is why [13] proposed the Gamma model to handle under-dispersed samples. The Gamma model is given as follows:

PGX=k=Gammaβkλ−Gammaβk+1λ,E17

where PGX is the probability of k accidents occurring, k∈N, λ is the expectation of the number of accidents, and β is the dispersion parameter. If β>1, there is an under-dispersion; while β<1, there is an over-dispersion and if β=1, the Gamma model reduces to the Poisson model. However, the Gamma model shown in Eq. (18) is limited to the time-dependent observation assumption and zero observations, since general Γx restricts discrete responses to positive values:

Gammaβkλ=1,fork=01Γβk∫0λuβk−1e−udu,fork>0.E18

According to the above discussion, the restriction between mean value and variance can be used to identify an appropriate regression model. Therefore, we firstly make preliminary variance analysis by means of group classification. Namely, the annual accidents at a given SAL2 during the 10 years were divided into 100 groups with the same number of samples in each group. Then, the variance and mean value of accidents in each group were calculated, respectively, to analyze the relationship between the group variance and the group mean value. The variance analysis shows that the variance and mean value are very close to each other. Hence, we performed meticulous analyses to assess the NLS regression, the Poisson regression, the ZIP regression, the NB regression, and the ZINB regression methods with regard to SAL2 LXs in our accident dataset so as to identify which model is more effective.

2.2.2 Regression modeling results

NLS regression:

When applying the NLS regression, the form of λ10Y is given by Eq. (3). The estimated coefficients computed by NLS regression are provided in Table 3. ∣t−statistic∣>1.96 is introduced to identify the significant parameters corresponding to a 95% confidence level. As a result, the railway speed limit, the average daily railway traffic, the average daily road traffic, the annual road accidents, the LX-accident-prone region, the road alignment, the LX width, and the crossing length have been shown to have significant and positive influence on SAL2 accident frequency. However, the test shows that the road profile is not a significant factor (∣t−statistic∣=0.635<<1.96); thus, the impact of road profile could be neglected. Moreover, the coefficients of the considered variables with the exponential form can reflect the sensitive degrees of the SAL2 accident frequency to these variables, respectively. According to these sensitive degrees (rank indicated in brackets), the LX-accident-prone region factor is the most sensitive contributor among these variables.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K	2.703e-05	5.078e-06	5.322	×
IProfile	CProfile	3.626e-02	5.706e-02	0.635
IAlign	CAlign	3.427e-01 (2)	2.942e-02	11.648	×
Wid	CWid	9.847e-02 (3)	1.494e-02	6.589	×
Leng	CLeng	2.084e-02 (4)	4.284e-03	4.865	×
RSL	CRSL	3.089e-03 (5)	7.586e-04	4.072	×
FReg	CReg	4.962e-01 (1)	1.722e-01	2.882	×

Table 3.

Results of the λ10Y NLS regression model.

In order to assess the predictive accuracy of accident occurrence estimated by the NLS regression model λ10Y combined with the NB and ZINB distributions (see Section 3.1), we adopt the maximum likelihood estimation (MLE) method to estimate the dispersion parameter α of the dataset [28]. As expressed by Eq. (19) and Eq. (20), the values of α in NB and ZINB distributions are estimated, respectively, using R language to solve ∂l/∂α=0:

lαNB=ln∏inPNBXi=yi=∑(yilnλi−yi+α−1ln1+αλi+∑v=0yi−1ln1+αv),E19

lαZINB=ln∏inPZINBXi=yi=∑lnωi+1−ωi11+αλi1/α,ifyi=0∑lnωi+lnΓ1α+yi−lnΓ1+yi−lnΓ1α+1αln11+αλi+yiln1−11+αλi,ifyi>0.E20

Poisson regression:

When applying the Poisson regression, the general form of λ10Poi is given by e∑j=1mβ0+βjxj. Therefore, we need to transform Eq. (3) into the following expression:

λ10Poi=0,ifFRAcc=0,V=0orT=0exp(K1+CF×FRAcc+CCM×CM+CProfile×IProfile+CAlign×IAlign+CWid×Wid+CLeng×Leng+CRSL×RSL+CReg×FReg),ifFRAcc≠0,V≠0,andT≠0E21

The results estimated through the Poisson regression approach are shown in Table 4. According to these results, being similar to the NLS case, one can notice that the road profile is not significant (∣t−statistic∣=0.621<<1.96). On the other hand, with an exponential form, the impact of road accident factor FRAcc is weakened, namely the impact of FRAcc with an exponential form is not significant when using Poisson regression approach (∣t−statistic∣=1.913<1.96). Furthermore, according to the sensitive degrees of these parameters with the exponential form (rank indicated in brackets), once again the LX-accident-prone region factor is the most sensitive contributor among these parameters.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K1	−9.562	0.440	−21.714	×
FRAcc	CF	0.636	0.332	1.913
CM	CCM	0.005 (6)	2.949e-04	17.144	×
IProfile	CProfile	−0.076	0.122	−0.621
IAlign	CAlign	0.326 (2)	0.069	4.756	×
Wid	CWid	0.206 (3)	0.026	8.051	×
Leng	CLeng	0.030 (4)	0.009	3.232	×
RSL	CRSL	0.011 (5)	0.001	7.895	×
FReg	CReg	1.725 (1)	0.334	5.165	×

Table 4.

Regression results of λ10Poi.

NB regression:

When applying the NB regression, the general form of λ10NB is given by e∑j=1mβ0+βjxj+ε, and it still requires to be expressed by Eq. (21). The dispersion parameter α is estimated at 3.2394 in our study through the iterative estimation algorithm automatically. The estimated results of the NB regression are shown in Table 5. According to the results associated with the NB regression approach, it is worth noticing that the road profile is still not significant (∣t−statistic∣=0.850<<1.96). One can also notice that the impact of FRAcc with an exponential form is not significant as well, when using the NB regression approach (∣t−statistic∣=1.793<1.96). Moreover, according to the sensitive degrees of these parameters with the exponential form (rank indicated in brackets), the LX-accident-prone region factor is still the most sensitive contributor among these parameters.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K1	−9.424	0.457	−20.615	×
FRAcc	CF	0.616	0.343	1.793
CM	CCM	0.006 (6)	3.762e-04	16.493	×
IProfile	CProfile	−0.107	0.126	−0.850
IAlign	CAlign	0.298 (2)	0.072	4.159	×
Wid	CWid	0.199 (3)	0.028	7.173	×
Leng	CLeng	0.031 (4)	0.010	3.201	×
RSL	CRSL	0.010 (5)	0.001	7.034	×
FReg	CReg	1.508 (1)	0.351	4.294	×

Table 5.

Regression results of λ10NB.

ZIP regression:

When applying the ZIP regression, the general form of λ10ZIP is given by e∑j=1mβ0+βjxj, and it still requires to be expressed by Eq. (21). The estimated results of the ZIP regression are shown in Table 6 and (for nonzero observations) and Table 7 (for zero-inflation observations).

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K1	−1.128e+01	7.586e-01	−14.867	×
FRAcc	CF	3.717e-01	4.202e-01	0.885
CM	CCM	6.221e-03 (4)	4.336e-04	14.347	×
IProfile	CProfile	−1.855e-01	1.513e-01	−1.226
IAlign	CAlign	1.483e-01	8.786e-02	1.688
Wid	CWid	4.397e-01 (2)	6.625e-02	6.636	×
Leng	CLeng	3.971e-02	1.725e-02	1.904
RSL	CRSL	1.432e-02 (3)	2.069e-03	6.921	×
FReg	CReg	2.319 (1)	6.655e-01	3.484	×

Table 6.

Count model regression results of λ10ZIP.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K1	−1.574e+01	4.276	−3.680	×
FRAcc	CF	−1.104	1.646	−0.671
CM	CCM	1.584e-03	1.450e-03	1.093
IProfile	CProfile	−4.355e-01	6.531e-01	0.505
IAlign	CAlign	−1.185	6.141e-01	−1.931
Wid	CWid	1.024 (2)	2.241e-01	4.571	×
Leng	CLeng	8.231e-02	4.190e-02	1.964
RSL	CRSL	4.117e-02 (3)	1.449e-02	2.840	×
FReg	CReg	5.861 (1)	1.748	3.353	×

Table 7.

Zero-inflation model regression results of λ10ZIP.

According to the results associated with the ZIP regression approach, it is worth noticing that, as for the nonzero related model, FRAcc, IProfile, IAlign, and Leng are not significant (<1.96). Moreover, according to the sensitive degrees of other significant parameters with the exponential form (rank indicated in brackets), the LX-accident-prone region factor is still the most sensitive contributor among these parameters. While as for the zero-inflation model, only the Wid, RSL_, and FReg are significant (>1.96).

ZINB regression:

When applying the ZINB regression, the general form of λ10ZINB is given by e∑j=1mβ0+βjxj+ε, and it still requires to be expressed by Eq. (21). The values of dispersion parameter α for nonzero observations and zero-inflation observations are estimated at 3.8102 and 1.4069, respectively, in our study through the iterative estimation algorithm automatically. The estimated results of the ZINB regression are shown in Table 8 (for nonzero observations) and Table 9 (for zero-inflation observations). According to the results associated with the ZINB regression approach, it is worth noticing that, as for the nonzero related model, CM, IAlign, and Wid are significant (>1.96). One can also notice that according to the sensitive degrees of the three parameters (rank indicated in brackets), the LX width is the most sensitive contributor among them. While as for the zero-inflation model, only the FRAcc and CM are significant (>1.96).

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K1	−7.128	0.734	−9.709	×
FRAcc	CF	0.671	0.413	1.624
CM	CCM	4.486e-03 (3)	4.991e-04	8.990	×
IProfile	CProfile	−5.886e-02	0.144	−0.406
IAlign	CAlign	0.371 (1)	8.274e-02	4.495	×
Wid	CWid	0.145 (2)	4.558e-02	3.175	×
Leng	CLeng	3.219e-03	1.203e-02	0.268
RSL	CRSL	2.558e-03	1.954e-03	1.309
FReg	CReg	0.795	0.446	1.783

Table 8.

Count model regression results of λ10ZINB.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K1	−4.036	2.190	−6.709	×
FRAcc	CF	0.260 (1)	1.456	2.179	×
CM	CCM	6.685e-02 (2)	1.838e-02	3.636	×
IProfile	CProfile	0.705	0.544	1.296
IAlign	CAlign	0.535	0.328	1.632
Wid	CWid	8.873e-02	0.180	0.491
Leng	CLeng	0.114	6.639e-02	1.725
RSL	CRSL	5456e-03	6.629e-03	0.823
FReg	CReg	1.632	1.679	0.972

Table 9.

Zero-inflation model regression results of λ10ZINB.

3. Model performance evaluation and discussion

In this section, we will assess the performance of our prediction models while determining an appropriate statistical distribution to be combined with the models, in such a way as to ensure the most accurate estimation of the probability of accidents occurring at a given SAL2 in a given year. The Bayesian information criterion (BIC) [29], Akaike’s information criterion (AIC) [30], the Pearson chi-square statistic (PCS) test [31], and the degree of freedom (DF) are used to evaluate the goodness of fit (GOF) of the model. They can be respectively expressed as follows:

BIC=n+n×ln2π+n×lnRSS/n+l+1lnn,E22

AIC=n+n×ln2π+n×lnRSS/n+2l+1,E23

PCS=∑i=1nOi−λi2λi,E24

DF=n−l+1,E25

where RSS is the sum of the squares of residuals between the annual accident frequencies observed and the annual accident frequencies estimated, n is the sample size, l is the number of independent exponential parameters, λi is the annual accident frequency expected, and Oi is the annual accident frequency observed.

The BIC and AIC are used to test the relative quality of models for a given dataset. Smaller BIC and AIC values indicate a better model fitting. The PCS test is used to determine if there is a significant difference between the values expected and the values observed. The PCS is roughly equal to DF if the model fits the data perfectly without any dispersion. Namely, the closer the PCS is to the DF, the better the model fits the data [14].

The log-likelihood statistic test (LL) is adopted to assess the GOF of the accident frequency prediction model combined with a statistical distribution. The larger the LL, the more preferred the model [14]. The mathematical expression of the LL is given as follows:

LL=∑i=1nlnP̂i,E26

where n is the sample size and P̂i is the estimated probability of accident frequency observed. P̂i is computed respectively according to the accident frequency prediction model combined with the Poisson or the NB distribution.

3.1 Model performance comparison among variants of λ10Y

The results of AIC, BIC, and PCS statistical tests are shown in Table 10 with the goodness ranked in brackets. The following findings are obtained: 1) considering AIC and BIC, the λ10Y model gives better results, since the AIC and BIC values corresponding to the λ10Y model are much smaller than those for the λ10Poi, λ10NB, λ10ZIP, and λ10ZINB models; 2) in terms of PCS test, the λ10Y model is also the most effective one, since the PCS of λ10Y model is closer to DF (DFs of λ10Y, λ10Poi, λ10NB, λ10ZIP, and λ10ZINB are considerably approximative).

Test	POI-λ10Y	NB-λ10Y	λ10Poi	λ10NB	λ10ZIP	λ10ZINB
AIC	−190,744 (1)	−190,744 (1)	−187,804 (5)	−189,942 (2)	−188,312 (4)	−189,826 (3)
BIC	−190,670 (1)	−190,670 (1)	−187,720 (5)	−189,858 (3)	−188,176 (4)	−189,935 (2)
PCS	65,796 (1)	65,796 (1)	125,495 (5)	123,715 (4)	118,185 (3)	110,496 (2)
DF	83,313	83,313	83,311	83,311	83,311	83,311
LL	−2599 (2)	−2596 (1)	−2732 (6)	−2711 (5)	−2701 (4)	−2631 (3)
Goodness score
(the lower, the better)	5	4	21	14	15	10

Table 10.

Model GOF comparison among variants of λ10Y.

LL test results are shown in Table 10. One can notice that, for the λ10Y model combined with either the Poisson or NB distribution, its GOFs are significantly better than λ10Poi and λ10NB models’ GOFs according to the LL test. Furthermore, the GOF of λ10Y combined with the NB distribution (NB-λ10Y) is better than when combined with the Poisson distribution (POI-λ10Y).

3.2 A comparison between λ10Y and two existing reference models

In this section, we compare the present model λ10Y with other two models which are widely used in existing related works. As mentioned in Section 1, the first widely used model is given in Eq. (1) [13, 14, 18]. In our study, this model can be specified as follows:

λTV=exp(K2+CV×V+CT×T+CF×FRAcc+CProfile×IProfile+CAlign×IAlign+CWid×Wid+CLeng×Leng+CRSL×RSL+CReg×FReg),E27

where the average daily road traffic V and the average daily railway traffic T are applied separately in exponential form.

The second model as shown in Eq. (2) (e.g., [17, 32]) is specified as Eq. (28) in our study:

λMon=exp(K3+CM×lnV×T+CF×FRAcc+CProfile×IProfile+CAlign×IAlign+CWid×Wid+CLeng×Leng+CRSL×RSL+CReg×FReg),E28

where the conventional traffic moment V×T is applied.

It should be noted that the ZIP and ZINB models were also investigated for λTV and λMon but resulted in no higher goodness-of-fit values and a quite small number of significant parameters compared with the Poisson and NB models and, hence, were not reported in this section. The Poisson and NB regression results of the λTV and λMon are shown in Tables 11–14, respectively. One can notice that the impacts of road profile and road accident are still not significant in the λTV and λMon. The AIC, BIC, PCS, and LL tests and observed/estimated accident frequency comparison are given in Table 15. According to the quality test results discussed in Section 3.1, the λ10Y combined with the NB distribution (NB-λ10Y) shows the best prediction performance among the four investigated combinations. Therefore, we will only compare the NB-λ10Y with the λTV and λMon combined with the Poisson and NB distributions, respectively, in the following content.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K2	−9.807	0.413	−22.223	×
V	CV	1.098e-04 (7)	1.613e-05	6.811	×
T	CT	8.777e-03 (6)	1.115e-03	7.869	×
FRAcc	CF	0.636	0.333	1.913
IProfile	CProfile	−1.445e-01	1.209e-01	−1.195
IAlign	CAlign	3.319e-01 (2)	6.747e-02	4.919	×
Wid	CWid	2.059e-01 (3)	2.483e-02	8.292	×
Leng	CLeng	3.952e-02 (4)	7.868e-03	5.024	×
RSL	CRSL	1.154e-02 (5)	1.487e-03	7.759	×
FReg	CReg	1.750 (1)	3.463e-01	5.053	×

Table 11.

Poisson regression results of λTV.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K2	−9.882	4.531e-01	−21.810	×
V	CV	1.155e-04 (7)	1.683e-05	6.861	×
T	CT	9.152e-03 (6)	1.234e-03	7.416	×
FRAcc	CF	0.607	3.402e-01	1.784
IProfile	CProfile	−1.532e-01	1.243e-01	−1.232
IAlign	CAlign	3.240e-01 (2)	6.988e-02	4.636	×
Wid	CWid	2.212e-01 (3)	2.579e-02	8.575	×
Leng	CLeng	3.895e-02 (4)	8.415e-03	4.629	×
RSL	CRSL	1.160e-02 (5)	1.529e-03	7.589	×
FReg	CReg	1.739 (1)	3.575e-01	4.864	×

Table 12.

NB regression results of λTV.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K2	−11.816	4.540e-01	−26.023	×
lnV×T	CM	4.036e-01 (2)	2.776e-02	14.538	×
FRAcc	CF	6.359e-01	3.325e-01	1.913
IProfile	CProfile	−6.279e-02	1.205e-01	−0.521
IAlign	CAlign	2.875e-01 (3)	6.799e-02	4.228	×
Wid	CWid	1.185e-01 (4)	3.296e-02	3.596	×
Leng	CLeng	2.213e-02 (5)	9.530e-03	2.322	×
RSL	CRSL	8.811e-03 (6)	1.350e-03	6.527	×
FReg	CReg	1.446 (1)	3.358e-01	4.307	×

Table 13.

Poisson regression results of λMon.

Parameter	Coefficient	Estimated value	Standard error	t-statistic	Significant
	K2	−11.850	4.628e-01	−26.603	×
lnV×T	CM	4.034e-01 (2)	2.822e-02	14.297	×
FRAcc	CF	6.368e-01	3.382e-01	1.883
IProfile	CProfile	−7.103e-02	1.230e-01	−0.578
IAlign	CAlign	2.848e-01 (3)	6.960e-02	4.092	×
Wid	CWid	1.214e-01 (4)	3.361e-02	3.612	×
Leng	CLeng	2.204e-02 (5)	9.752e-03	2.260	×
RSL	CRSL	8.892e-03 (6)	1.368e-03	6.500	×
FReg	CReg	1.480 (1)	3.428e-01	4.316	×

Table 14.

NB regression results of λMon.

Test	NB-λ10Y	POI-λTV	NB-λTV	POI-λMon	NB-λMon
AIC	−190,744 (1)	−177,914 (5)	−179,842 (4)	−183,714 (3)	−186,532 (2)
BIC	−190,670 (1)	−177,610 (5)	−179,738 (4)	−183,587 (3)	−186,191 (2)
PCS	65,796 (1)	121,715 (5)	119,133 (4)	118,511 (3)	115,634 (2)
DF	83,313	83,310	83,310	83,311	83,311
LL	−2596 (1)	−2722 (5)	−2703 (3)	−2705 (4)	−2683 (2)
Goodness score
(the lower, the better)	4	20	15	13	8

Table 15.

Model GOF comparison among λ10Y, λTV, and λMon.

As shown in Table 15, the AIC, BIC, and PCS results related to the λ10Y model are better than those for the λTV and λMon models. Moreover, in terms of the LL test, the NB-λ10Y is still the most preferred one.

4. Conclusions

Based on our study, some remarks need to be highlighted as follows:

The corrected traffic moment proposed is more effective in estimating automobile-involved LX accidents frequency compared with the conventional traffic moment, single average daily railway traffic or single average daily road traffic. It is worth mentioning that the average daily railway traffic with a power of 0.646 has a more decisive impact on the LX accident frequency than the average daily road traffic with a power of 0.354. Moreover, the higher the combined exposure of railway and roadway traffic, the higher the likelihood of an accident occurring.
According to the analyses above, the form of λ10Y highlights the impact of road accident factor FRAcc, while the impact of FRAcc is neglected in λ10Poi, λ10NB, λTV_, and λMon models (see Tables 4,5,11–14). The impact of road accidents on the risk level was likely to be ignored in the previous studies related to LX safety analysis.
We originally introduce the region LX-accident-prone factor (see Table 2) in this study to interpret the variation of LX accident statistics with regard to various regions. According to the sensitive degrees of variables ranked in Table 3, among the LX characteristics, the risk of LX accidents is most sensitive to the region LX-accident-prone factor. However, in many past studies, the impact of LX local region is neglected. In fact, the regional accident history varies from one region to another, which correspondingly has varying degrees of impact on the LX accident frequency in different regions.

To sum up, the develop model λ10Y has trustworthy goodness of fit. Moreover, it shows relatively high prediction accuracy for LX accident frequency prediction when combined with the NB distribution.

References

1. Read GJM, Salmon PM, Lenné MG, Stanton NA. Walking the line: Understanding pedestrian behaviour and risk at rail level crossings with cognitive work analysis. Applied Ergonomics. 2016;53:209-227
2. Ghazel M. Using stochastic petri nets for level-crossing collision risk assessment. IEEE Transactions on Intelligent Transportation Systems. 2009;10(4):668-677
3. Liu B, Ghazel M, Toguyéni A. Model-based diagnosis of multi-track level crossing plants. IEEE Transactions on Intelligent Transportation Systems. 2016;17(2):546-556
4. ERA. Railway safety performance in the European Union. 9(2) Agency Regulation 881/2004/EC. 2014
5. Ghazel M, El-Koursi EM. Two-half-barrier level crossings versus four-half-barrier level crossings: A comparative risk analysis study. IEEE Transactions on Intelligent Transportation Systems. 2014;15(3):1123-1133
6. Silmon J, Roberts C. Using functional analysis to determine the requirements for changes to critical systems: Railway level crossing case study. Reliability Engineering and System Safety. 2010;95(3):216-225
7. Australian Transport Safety Bureau. Australian Rail Safety Occurrence Data: 1 July 2002 to 30 June 2012 (ATSB Transport Safety Report RR-2012-010). Canberra, Australia: ATSB; 2012
8. SNCF Réseau. World Conference of Road Safety at Level Crossings. 2011. Available from: http://www.planetoscope.com/automobile/1271-nombre-de-collisions-aux- passages-a-niveau-en-france.html
9. Plesse G. Des détecteurs d‘obstacles déployés aux passages à niveau. 2017. Available from: http://www.leparisien.fr/info-paris-ile-de-france-oise/transports/des-detecteurs-d-obstacles-deployes-aux-passages-a-niveau-02-06-2017-7011714.php
10. Liang C, Ghazel M, Cazier O, El-Koursi EM. Analyzing risky behavior of motorists during the closure cycle of railway level crossings. Safety Science. 2018;110:115-126
11. Lord D, Mannering F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice. 2010;44(5):291-305
12. Guikema SD, Quiring SM. Hybrid data mining-regression for infrastructure risk assessment based on zero-inflated data. Reliability Engineering and System Safety. 2012;99:178-182
13. Oh J, Washington SP, Nam D. Accident prediction model for railway-highway interfaces. Accident; Analysis and Prevention. 2006;38(2):346-356
14. Lu P, Tolliver D. Accident Analysis & Prevention. Accident prediction model for public highway-rail grade crossings. 2016;90:3-81
15. Medina JC, Benekohal RF. Macroscopic models for accident prediction at railroad grade crossings: Comparisons with US Department of Transportation accident prediction formula. Transportation Research Record: Journal of the Transportation Research Board. 2015;2476:85-93
16. Chadwick SG, Zhou N, Saat MR. Highway-rail grade crossing safety challenges for shared operations of high-speed passenger and heavy freight rail in the US. Safety Science. 2014;68:128-137
17. Miranda-Moreno L, Fu L, Saccomanno FF, Labbe A. Alternative risk models for ranking locations for safety improvement. Transportation Research Record: Journal of the Transportation Research Board. 2005;1908:1-8
18. Austin RD, Carson JL. An alternative accident prediction model for highway-rail interfaces. Accident; Analysis and Prevention. 2002;34(1):31-42
19. Liang C, Ghazel M, Cazier O, El-Koursi EM. Risk analysis on level crossings using a causal Bayesian network based approach. Transportation Research Procedia. 2017;25:2172-2186
20. Liang C, Ghazel M, Cazier O, El-Koursi EM. Developing accident prediction model for railway level crossings. Safety Science. 2018;101:48-59
21. SNCF Réseau. SNCF. Statistical Analysis of Accidents at LXs. France: SNCF Réseau; 2010
22. Madsen K, Nielsen HB, Tingleff O. Methods for non-linear least squares problems. In: Informatics and Mathematical Modelling. 2nd ed. Denmark: Technical University of Denmark; 2004
23. Chang LY. Analysis of freeway accident frequencies: Negative binomial regression versus artificial neural network. Safety Science. 2005;43(8):541-557
24. Buddhavarapu P, Scott JG, Prozzi JA. Modeling unobserved heterogeneity using finite mixture random parameters for spatially correlated discrete count data. Transportation Research Part B: Methodological. 2016;91:492-510
25. Utkin LV, Coolen FPA, Gurov SV. Imprecise inference for warranty contract analysis. Reliability Engineering and System Safety. 2015;138:31-39
26. Miaou SP. The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accident; Analysis and Prevention. 1994;26(4):471-482
27. Ridout M, Hinde J, DeméAtrio C. A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives. Biometrics. 2001;57(1):219-223
28. Dai H, Bao Y, Bao M. Maximum likelihood estimate for the dispersion parameter of the negative binomial distribution. Statistics & Probability Letters. 2013;83(1):21-27
29. Weakliem DL. A critique of the Bayesian information criterion for model selection. Sociological Methods & Research. 1999;27(3):359-397
30. Bozdogan H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika. 1987;52(3):345-370
31. Pearson KX. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1900;50(302):157-175
32. Saccomanno FF, Fu L, Miranda-Moreno L. Risk-based model for identifying highway-rail grade crossing blackspots. Transportation Research Record: Journal of the Transportation Research Board. 2004;1862:127-135

Sections

Author information

1.Introduction
2.Method
3.Model performance evaluation and discussion
4.Conclusions

References

Publish with IntechOpen

Next chapter

Railways Passenger Comfort/Discomfort: Objective Evaluation

By Patrícia Silva, Joaquim Mendes, Eurico Seabra and Pedro Pratas

114 downloads

[1] 1. Read GJM, Salmon PM, Lenné MG, Stanton NA. Walking the line: Understanding pedestrian behaviour and risk at rail level crossings with cognitive work analysis. Applied Ergonomics. 2016;53:209-227

[2] 2. Ghazel M. Using stochastic petri nets for level-crossing collision risk assessment. IEEE Transactions on Intelligent Transportation Systems. 2009;10(4):668-677

[3] 3. Liu B, Ghazel M, Toguyéni A. Model-based diagnosis of multi-track level crossing plants. IEEE Transactions on Intelligent Transportation Systems. 2016;17(2):546-556

[4] 4. ERA. Railway safety performance in the European Union. 9(2) Agency Regulation 881/2004/EC. 2014

[5] 5. Ghazel M, El-Koursi EM. Two-half-barrier level crossings versus four-half-barrier level crossings: A comparative risk analysis study. IEEE Transactions on Intelligent Transportation Systems. 2014;15(3):1123-1133

[6] 6. Silmon J, Roberts C. Using functional analysis to determine the requirements for changes to critical systems: Railway level crossing case study. Reliability Engineering and System Safety. 2010;95(3):216-225

[7] 7. Australian Transport Safety Bureau. Australian Rail Safety Occurrence Data: 1 July 2002 to 30 June 2012 (ATSB Transport Safety Report RR-2012-010). Canberra, Australia: ATSB; 2012

[8] 8. SNCF Réseau. World Conference of Road Safety at Level Crossings. 2011. Available from: http://www.planetoscope.com/automobile/1271-nombre-de-collisions-aux- passages-a-niveau-en-france.html

[9] 9. Plesse G. Des détecteurs d‘obstacles déployés aux passages à niveau. 2017. Available from: http://www.leparisien.fr/info-paris-ile-de-france-oise/transports/des-detecteurs-d-obstacles-deployes-aux-passages-a-niveau-02-06-2017-7011714.php

[10] 10. Liang C, Ghazel M, Cazier O, El-Koursi EM. Analyzing risky behavior of motorists during the closure cycle of railway level crossings. Safety Science. 2018;110:115-126

[11] 11. Lord D, Mannering F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice. 2010;44(5):291-305

[12] 12. Guikema SD, Quiring SM. Hybrid data mining-regression for infrastructure risk assessment based on zero-inflated data. Reliability Engineering and System Safety. 2012;99:178-182

[13] 13. Oh J, Washington SP, Nam D. Accident prediction model for railway-highway interfaces. Accident; Analysis and Prevention. 2006;38(2):346-356

[14] 14. Lu P, Tolliver D. Accident Analysis & Prevention. Accident prediction model for public highway-rail grade crossings. 2016;90:3-81

[15] 15. Medina JC, Benekohal RF. Macroscopic models for accident prediction at railroad grade crossings: Comparisons with US Department of Transportation accident prediction formula. Transportation Research Record: Journal of the Transportation Research Board. 2015;2476:85-93

[16] 16. Chadwick SG, Zhou N, Saat MR. Highway-rail grade crossing safety challenges for shared operations of high-speed passenger and heavy freight rail in the US. Safety Science. 2014;68:128-137

[17] 17. Miranda-Moreno L, Fu L, Saccomanno FF, Labbe A. Alternative risk models for ranking locations for safety improvement. Transportation Research Record: Journal of the Transportation Research Board. 2005;1908:1-8

[18] 18. Austin RD, Carson JL. An alternative accident prediction model for highway-rail interfaces. Accident; Analysis and Prevention. 2002;34(1):31-42

[19] 19. Liang C, Ghazel M, Cazier O, El-Koursi EM. Risk analysis on level crossings using a causal Bayesian network based approach. Transportation Research Procedia. 2017;25:2172-2186

[20] 20. Liang C, Ghazel M, Cazier O, El-Koursi EM. Developing accident prediction model for railway level crossings. Safety Science. 2018;101:48-59

[21] 21. SNCF Réseau. SNCF. Statistical Analysis of Accidents at LXs. France: SNCF Réseau; 2010

[22] 22. Madsen K, Nielsen HB, Tingleff O. Methods for non-linear least squares problems. In: Informatics and Mathematical Modelling. 2nd ed. Denmark: Technical University of Denmark; 2004

[23] 23. Chang LY. Analysis of freeway accident frequencies: Negative binomial regression versus artificial neural network. Safety Science. 2005;43(8):541-557

[24] 24. Buddhavarapu P, Scott JG, Prozzi JA. Modeling unobserved heterogeneity using finite mixture random parameters for spatially correlated discrete count data. Transportation Research Part B: Methodological. 2016;91:492-510

[25] 25. Utkin LV, Coolen FPA, Gurov SV. Imprecise inference for warranty contract analysis. Reliability Engineering and System Safety. 2015;138:31-39

[26] 26. Miaou SP. The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accident; Analysis and Prevention. 1994;26(4):471-482

[27] 27. Ridout M, Hinde J, DeméAtrio C. A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives. Biometrics. 2001;57(1):219-223

[28] 28. Dai H, Bao Y, Bao M. Maximum likelihood estimate for the dispersion parameter of the negative binomial distribution. Statistics & Probability Letters. 2013;83(1):21-27

[29] 29. Weakliem DL. A critique of the Bayesian information criterion for model selection. Sociological Methods & Research. 1999;27(3):359-397

[30] 30. Bozdogan H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika. 1987;52(3):345-370

[31] 31. Pearson KX. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1900;50(302):157-175

[32] 32. Saccomanno FF, Fu L, Miranda-Moreno L. Risk-based model for identifying highway-rail grade crossing blackspots. Transportation Research Record: Journal of the Transportation Research Board. 2004;1862:127-135

Accident Prediction Modeling Approaches for European Railway Level Crossing Safety

New Research on Railway Engineering and Transportation

Abstract

Keywords

Author Information

Ci Liang*

Mohamed Ghazel

1. Introduction

2. Method

2.1 Data sources and coding

Table 1.

Table 2.

2.2 Advanced accident prediction model

2.2.1 Regression approaches

2.2.2 Regression modeling results

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

3. Model performance evaluation and discussion

3.1 Model performance comparison among variants of λ10Y

Table 10.

3.2 A comparison between λ10Y and two existing reference models

Table 11.

Table 12.

Table 13.

Table 14.

Table 15.

4. Conclusions

References

Continue reading from the same book

New Research on Railway Engineering and Transportation