Open access peer-reviewed chapter

Accident Prediction Modeling Approaches for European Railway Level Crossing Safety

Written By

Ci Liang and Mohamed Ghazel

Reviewed: 05 January 2023 Published: 05 March 2023

DOI: 10.5772/intechopen.109865

From the Edited Volume

New Research on Railway Engineering and Transportation

Edited by Ali G. Hessami and Roderick Muttram

Chapter metrics overview

99 Chapter Downloads

View Full Metrics

Abstract

Safety is a core concern in the railway operation. Particularly, in Europe, level crossing (LX) safety is one of the most critical issues for railways. LX accidents often lead to fatalities and weighted injuries and seriously hamper railway safety reputation. Moreover, according to statistics, collisions between trains and motorized vehicles contribute most to LX accidents. With this in mind, we will elaborate on accident prediction modeling for train-vehicle collisions at LXs in this chapter. The methods and findings discussed in this chapter will offer an in-depth insight for interpreting significant aspects underlying collision occurrence and facilitate identifying technical countermeasures to improve LX safety.

Keywords

  • level crossing safety
  • train-vehicle collisions
  • accident prediction modeling
  • nonlinear least-squares method
  • negative binomial regression method
  • Poisson regression method
  • zero-inflated Poisson regression method
  • zero-inflated negative binomial regression method
  • model performance evaluation

1. Introduction

The level crossing (LX) is railway property upon which road users are given permission to cross [1]. Accidents at LXs give rise to serious material and human damage, and the majority of accidents are caused by vehicle driver violations. As demonstrated by accident statistics, LX safety is one of the most critical issues that railway stakeholders need to deal with [2, 3]. In 2012, there were more than 118,000 LXs in the 28 countries of the European Union (E.U.) [4]. In some E.U. countries, LX accidents account for up to 50% of railway accidents [5]. In the UK, LXs account for 11.8 fatalities and weighted injuries on average per year, comprising 8.4% of the total system risk for the railway network [6]. There were 49 collisions between road vehicles and trains at LXs in Australia in 2011 [7]. In France, the railway network incorporates more than 18,000 LXs for 30,000 km of railway lines and around 13,000 LXs show heavy road and railway traffic [8]. In 2016, 111 train-vehicle collisions at French LXs led to 31 deaths [9]. This number was half the total number of collisions per year at LXs a decade ago, but still too large [10]. Due to nondeterministic causes, complex operation background, and the lack of thorough statistical analysis based on detailed accident/incident data, the risk assessment of LXs remains a challenging task. Therefore, there is a pressing need for a series of thorough analyses to understand the potential reasons for these accidents and to identify practical countermeasures to prevent accidents at LXs, thus significantly reducing the LX accidents.

In recent years, the Poisson regression model, negative binomial (NB) regression model, and other variants of the Poisson regression model [11, 12] have gained popularity to deal with risk/accident statistics. Ref. [13] adopted the expressions of the estimated expectation value λ̂ as shown in Eq. (1) corresponding to the Poisson regression and NB regression models, respectively. Ref. [14] employed the variants of Poisson regression model, namely, the zero-inflated Poisson (ZIP) model and the hurdle Poisson model, to deal with LX accident prediction involving the data in North Dakota. Ref. [15] compared the zero-inflated negative binomial (ZINB) model with the USDOT model [16] by using the LX accident data from Illinois, in terms of accident prediction accuracy. The results of this study show that the ZINB model has higher accuracy of prediction. It is worth noticing that the expressions of estimated λ̂ as shown in Eq. (1) are not appropriate in our current study, since they are limited to handling zero observations and some impacting variables should not be in the exponential form. Ref. [17] developed another model of λ̂ as shown in Eq. (2). In this model, the product of the average daily road traffic V and the average daily railway traffic T (known as the conventional traffic moment) is adopted. However, using the conventional traffic moment hinders improving the accuracy of the prediction model:

λ̂Poi=expj=1mβ0+βjxj,λ̂NB=expj=1mβ0+βjxj+ε,E1

where β is the estimated regression coefficient, x is the impacting variable, and ε is the gamma-distributed error in NB regression model:

λ̂=V×Tβ1expj=1mβjxj+σ,E2

where σ=β0 in Poisson regression model or σ=β0+ε in NB regression model.

Based on these investigations, it is clear that there is a pressing need for an appropriate accident prediction model that should comprehensively consider contributing factors toward LX safety. Moreover, such a model should have high predictive accuracy. Therefore, in the present study, a new accident prediction model is developed to predict the accident frequency at LXs. Specifically, we focus on the SAL2 type of LX (i.e., an automated LX system with two half barriers and flashing lights), which is the most widely used type of LX in France and contributed most to the total number of accidents at French LXs from 1974 to 2014.

Advertisement

2. Method

In this section, an advanced accident prediction model is developed, which enables to rank risky LXs accurately and identify the significant impacting parameters efficiently. The model considers the average daily road traffic, the average daily railway traffic, the annual road accidents, the vertical road profile, the horizontal road alignment, the road width, the crossing length, the railway speed limit, and the geographic region. The nonlinear least-squares (NLS) method, Poisson regression method, NB regression method, ZIP regression method, and ZINB regression method are employed to estimate the respective coefficients of parameters in the prediction model.

2.1 Data sources and coding

The dataset used in our study, which cover SAL2 LXs in 21 administrative regions in mainland France from 2004 to 2013, has been provided by SNCF Réseau (the French national railway infrastructure manager). Moreover, the dataset includes 10 years of information about annual LX accident frequency, annual roadway accident statistics and railway, roadway, and LX characteristics. In total, there are 8332 public SAL2 LXs involved in our investigation. The impacting parameters relevant to LX accidents considered in our investigation can fulfill the following characteristics: (1) important in determining accident frequency, (2) more permanent in nature (e.g., sight obstruction noted as a problematic factor due to involved alterable construction topography, vegetation, and other environmental elements), and (3) not accident-dependent [18]. The statistical characterization of parameters considered in this investigation are shown in Table 1. It is worth noticing that the road accident factor is reflected by the ratio of the annual number of road accidents in a given year to the average number of road accidents per year over the period of 10 years considered, while the region risk factor is reflected by the general accident frequency per SAL2 in the corresponding region. Overall, the data coding is shown in Table 2.

ParameterDescriptionMeanStd. dev.
Railway traffic characteristics
Average daily railway trafficThe average number of trains crossing the LX daily;26.130.2
Railway speed limitThe maximum permission speed of train within the LX section;92.542.4
Roadway traffic characteristics
Average daily road trafficThe average number of road vehicles crossing the LX daily;826.81.8e+03
Annual road accidentsThe number of road accidents in a given year;7.1e+049.7e+03
LX characteristics
AlignmentHorizontal road alignment shape: “straight”, “curve,” or “S”;N/AN/A
ProfileVertical road profile shape: “normal”, “hump,” or cavity”;N/AN/A
LengthThe entering road width;9.73.9
WidthThe distance that road vehicles need to cross through the LX;5.51.4
RegionThe region of the LX considered;N/AN/A

Table 1.

Statistical characterization of parameters considered.

ParameterData coding
Railway traffic characteristics
Average daily railway trafficNumerical, used directly;
Railway speed limitNumerical, used directly;
Roadway traffic characteristics
Average daily road trafficNumerical, used directly;
Annual road accidentsRoad accident factor: Annual road accidents in a given year/Average road accidents per year over the period observed;
LX characteristics
AlignmentAlignment indicator: 0, 1, and 2 represent “straight”, “curve,” and “S,” respectively;
ProfileProfile indicator: 0 and 1 represent “normal” and “hump or cavity,” respectively;
LX widthNumerical, used directly;
Crossing lengthNumerical, used directly;
RegionRegion risk factor, highlighting the general LX-accident-prone region: The number of SAL2 accidents over the observation period in the region considered/The number of SAL2 LXs in the region considered;

Table 2.

Parameters considered and data coding.

2.2 Advanced accident prediction model

Here, we define that the formula of the conventional traffic moment is given as: Traffic moment = Road traffic frequency × Railway traffic frequency [19]. However, based on some previous analyses [20], we adopt a variant called “corrected moment,” or CM for short. CM=Va×Tb, where a+b=1 and the optimal value of a in terms of fitting is calculated to be a=0.354 according to the statistical analysis performed by SNCF Réseau [21]. Therefore, we consider V0.354×T0.646 as an integrated parameter that reflects the combined exposure frequency of both railway and road traffic.

The developed advanced model takes into account various variables as interpreted in Table 2. The general form of the model is shown as follows:

λ10Y=K×FRAcc×Va×Tb×exp(CProfile×IProfile+CAlign×IAlign+CWid×Wid+CLeng×Leng+CRSL×RSL+CReg×FReg),E3

where λ10Y represents the annual accident frequency at a given SAL2 for a period of 10 years; FRAcc is the road accident factor, which is a time-dependent variable and reflects the variation of annual road accidents as time advances; K is the coefficient of FRAcc; V denotes the average daily road traffic; T denotes the average daily railway traffic; IProfile is the profile indicator and CProfile is the coefficient of IProfile; IAlign is the alignment indicator and CAlign is the coefficient of IAlign; Wid is the LX width and CWid is the coefficient of Wid; Leng is the crossing length and CLeng is the coefficient of Leng; RSL is the railway speed limit and CRSL is the coefficient of RSL; FReg is the region factor and CReg is the coefficient of FReg. Note that this model does not only rank risky LXs accurately but also allow for identifying significant parameters efficiently.

2.2.1 Regression approaches

In this section, several regression approaches are adopted to estimate the coefficients associated with the parameters of our model. The nonlinear least-squares (NLS) technique and Gauss-Newton algorithm [22] are firstly considered to estimate the variable coefficients in our model. Considering a fitting model function y=fxβ, where variable x depends on a vector of l parameters: β=β1β2βl. The goal is to find the vector β which can let the model function fit best the actual observed data in the least-squares sense. In other words, minimize the sum of residual squares S expressed as follows:

S=i=1mri2,ml,E4

where ri is the residual between the fitting model estimation and the actual observation, ri=yifxiβ.

The minimum value of S is obtained by solving the gradient function S/βj=0, i.e.,

S/βj=2iriri/βj=0,βjβjk+1=βjk+Δβj,E5

where k is the iteration number and Δβj is the shift parameter.

At each iteration step, the model is linearized by approximation to the first-order Taylor series expansion about βk:

fxiβfxiβk+j=1lβjβjkfxiβk/βjfxiβk+j=1lJijΔβj,E6

where Jij is the element of Jacobian matrix J and ri/βj=Jij.

Therefore, ri can be rewritten as:

ri=Δyis=1lJisΔβs,Δyi=yifxiβk.E7

By substituting the above expressions into the gradient equation in Eq. (5), we obtain the normal equation and its matrix notation:

i=1ms=1lJijJisΔβs=i=1mJijΔyi,JTJΔβ=JTΔy.E8

For an NLS model, S should be modified as follows:

S=i=1mWiiri2,ml.E9

Therefore, the matrix notation of normal equation for an NLS model is expressed as follows:

JTWJΔβ=JTWΔy.E10

These aforementioned equations form the basis of the Gauss-Newton algorithm for solving an NLS problem.

In fact, the Poisson regression model shown as Eq. (11) is a natural choice for modeling accident occurrence:

PoiX=k=λkeλk!,k=0,1,2,,E11

where PoiX=k is the probability of k accidents occurring, kN, and λ is the expectation value of the number of accidents.

However, [23] indicates that accident frequency is likely to be over-dispersed (see Eq. (12)) and suggests using the negative binomial (NB) regression model as an alternative to the Poisson model:

VARX=EX>EX,for overdispersed<EX,for underdispersed.E12

The NB model as a special case of Poisson-Gamma mixture model is a variant of the Poisson model designed to deal with over-dispersed data [11, 24, 25]. The over-dispersion could come from several possible sources, e.g., omitted variables, uncertainty in exposure data, covariates, or nonhomogeneous LX environment [26]. The NB model considered in this study has the following expression:

PNBX=k=Γk+1αΓk+1Γ1α11+αλ1/ααλ1+αλk,k=0,1,2,,E13

where PNBX is the probability of k accidents occurring, kN, α is the dispersion parameter, and λ is the expectation of the number of accidents.

The relationship between the mean value and the variance in the NB model is given as follows:

VARX=αEX2+EX,E14

if α<0, there is an under-dispersion; if α>0, there is an over-dispersion; in the case where α=0, the NB model reduces to the Poisson model.

In practice, the count data may contain extra zeros relative to the Poisson or NB distribution. In this case, the ZIP or ZINB regression model is useful for analyzing such data [27]. The ZIP model is expressed as follows:

PZIPX=k=ω+1ωexpλ,fork=01ωexpλλk/k!,fork>0,E15

where PZIPX=k is the probability of k accidents occurring, kN, λ is the expectation value of the number of accidents, and logω1ω=zγ is the ZI link function that z is the ZI covariate and γ is the corresponding ZI coefficient. The mean value and variance of ZIP model are EX=1ωλ and VARX=1ωλ1+ωλ.

The ZINB model is expressed as follows:

PZINBX=k=ω+1ω1+αλ1/α,fork=01ωΓk+1αΓk+1Γ1α11+αλ1/ααλ1+αλk,fork>0,E16

where PZINBX=k is the probability of k accidents occurring, kN and λ is the expectation value of the number of accidents. The mean value and variance of ZINB model are EX=1ωλ and VARX=1ωλ1+ωλ+αλ. The ZINB reduces to the ZIP in the limit α0.

However, the NB and ZINB models are limited to handling under-dispersed data (α<0) [11]. That is why [13] proposed the Gamma model to handle under-dispersed samples. The Gamma model is given as follows:

PGX=k=GammaβkλGammaβk+1λ,E17

where PGX is the probability of k accidents occurring, kN, λ is the expectation of the number of accidents, and β is the dispersion parameter. If β>1, there is an under-dispersion; while β<1, there is an over-dispersion and if β=1, the Gamma model reduces to the Poisson model. However, the Gamma model shown in Eq. (18) is limited to the time-dependent observation assumption and zero observations, since general Γx restricts discrete responses to positive values:

Gammaβkλ=1,fork=01Γβk0λuβk1eudu,fork>0.E18

According to the above discussion, the restriction between mean value and variance can be used to identify an appropriate regression model. Therefore, we firstly make preliminary variance analysis by means of group classification. Namely, the annual accidents at a given SAL2 during the 10 years were divided into 100 groups with the same number of samples in each group. Then, the variance and mean value of accidents in each group were calculated, respectively, to analyze the relationship between the group variance and the group mean value. The variance analysis shows that the variance and mean value are very close to each other. Hence, we performed meticulous analyses to assess the NLS regression, the Poisson regression, the ZIP regression, the NB regression, and the ZINB regression methods with regard to SAL2 LXs in our accident dataset so as to identify which model is more effective.

2.2.2 Regression modeling results

NLS regression:

When applying the NLS regression, the form of λ10Y is given by Eq. (3). The estimated coefficients computed by NLS regression are provided in Table 3. tstatistic>1.96 is introduced to identify the significant parameters corresponding to a 95% confidence level. As a result, the railway speed limit, the average daily railway traffic, the average daily road traffic, the annual road accidents, the LX-accident-prone region, the road alignment, the LX width, and the crossing length have been shown to have significant and positive influence on SAL2 accident frequency. However, the test shows that the road profile is not a significant factor (tstatistic=0.635<<1.96); thus, the impact of road profile could be neglected. Moreover, the coefficients of the considered variables with the exponential form can reflect the sensitive degrees of the SAL2 accident frequency to these variables, respectively. According to these sensitive degrees (rank indicated in brackets), the LX-accident-prone region factor is the most sensitive contributor among these variables.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K2.703e-055.078e-065.322×
IProfileCProfile3.626e-025.706e-020.635
IAlignCAlign3.427e-01 (2)2.942e-0211.648×
WidCWid9.847e-02 (3)1.494e-026.589×
LengCLeng2.084e-02 (4)4.284e-034.865×
RSLCRSL3.089e-03 (5)7.586e-044.072×
FRegCReg4.962e-01 (1)1.722e-012.882×

Table 3.

Results of the λ10Y NLS regression model.

In order to assess the predictive accuracy of accident occurrence estimated by the NLS regression model λ10Y combined with the NB and ZINB distributions (see Section 3.1), we adopt the maximum likelihood estimation (MLE) method to estimate the dispersion parameter α of the dataset [28]. As expressed by Eq. (19) and Eq. (20), the values of α in NB and ZINB distributions are estimated, respectively, using R language to solve l/α=0:

lαNB=lninPNBXi=yi=(yilnλiyi+α1ln1+αλi+v=0yi1ln1+αv),E19
lαZINB=lninPZINBXi=yi=lnωi+1ωi11+αλi1/α,ifyi=0lnωi+lnΓ1α+yilnΓ1+yilnΓ1α+1αln11+αλi+yiln111+αλi,ifyi>0.E20

Poisson regression:

When applying the Poisson regression, the general form of λ10Poi is given by ej=1mβ0+βjxj. Therefore, we need to transform Eq. (3) into the following expression:

λ10Poi=0,ifFRAcc=0,V=0orT=0exp(K1+CF×FRAcc+CCM×CM+CProfile×IProfile+CAlign×IAlign+CWid×Wid+CLeng×Leng+CRSL×RSL+CReg×FReg),ifFRAcc0,V0,andT0E21

The results estimated through the Poisson regression approach are shown in Table 4. According to these results, being similar to the NLS case, one can notice that the road profile is not significant (tstatistic=0.621<<1.96). On the other hand, with an exponential form, the impact of road accident factor FRAcc is weakened, namely the impact of FRAcc with an exponential form is not significant when using Poisson regression approach (tstatistic=1.913<1.96). Furthermore, according to the sensitive degrees of these parameters with the exponential form (rank indicated in brackets), once again the LX-accident-prone region factor is the most sensitive contributor among these parameters.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K1−9.5620.440−21.714×
FRAccCF0.6360.3321.913
CMCCM0.005 (6)2.949e-0417.144×
IProfileCProfile−0.0760.122−0.621
IAlignCAlign0.326 (2)0.0694.756×
WidCWid0.206 (3)0.0268.051×
LengCLeng0.030 (4)0.0093.232×
RSLCRSL0.011 (5)0.0017.895×
FRegCReg1.725 (1)0.3345.165×

Table 4.

Regression results of λ10Poi.

NB regression:

When applying the NB regression, the general form of λ10NB is given by ej=1mβ0+βjxj+ε, and it still requires to be expressed by Eq. (21). The dispersion parameter α is estimated at 3.2394 in our study through the iterative estimation algorithm automatically. The estimated results of the NB regression are shown in Table 5. According to the results associated with the NB regression approach, it is worth noticing that the road profile is still not significant (tstatistic=0.850<<1.96). One can also notice that the impact of FRAcc with an exponential form is not significant as well, when using the NB regression approach (tstatistic=1.793<1.96). Moreover, according to the sensitive degrees of these parameters with the exponential form (rank indicated in brackets), the LX-accident-prone region factor is still the most sensitive contributor among these parameters.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K1−9.4240.457−20.615×
FRAccCF0.6160.3431.793
CMCCM0.006 (6)3.762e-0416.493×
IProfileCProfile−0.1070.126−0.850
IAlignCAlign0.298 (2)0.0724.159×
WidCWid0.199 (3)0.0287.173×
LengCLeng0.031 (4)0.0103.201×
RSLCRSL0.010 (5)0.0017.034×
FRegCReg1.508 (1)0.3514.294×

Table 5.

Regression results of λ10NB.

ZIP regression:

When applying the ZIP regression, the general form of λ10ZIP is given by ej=1mβ0+βjxj, and it still requires to be expressed by Eq. (21). The estimated results of the ZIP regression are shown in Table 6 and (for nonzero observations) and Table 7 (for zero-inflation observations).

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K1−1.128e+017.586e-01−14.867×
FRAccCF3.717e-014.202e-010.885
CMCCM6.221e-03 (4)4.336e-0414.347×
IProfileCProfile−1.855e-011.513e-01−1.226
IAlignCAlign1.483e-018.786e-021.688
WidCWid4.397e-01 (2)6.625e-026.636×
LengCLeng3.971e-021.725e-021.904
RSLCRSL1.432e-02 (3)2.069e-036.921×
FRegCReg2.319 (1)6.655e-013.484×

Table 6.

Count model regression results of λ10ZIP.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K1−1.574e+014.276−3.680×
FRAccCF−1.1041.646−0.671
CMCCM1.584e-031.450e-031.093
IProfileCProfile−4.355e-016.531e-010.505
IAlignCAlign−1.1856.141e-01−1.931
WidCWid1.024 (2)2.241e-014.571×
LengCLeng8.231e-024.190e-021.964
RSLCRSL4.117e-02 (3)1.449e-022.840×
FRegCReg5.861 (1)1.7483.353×

Table 7.

Zero-inflation model regression results of λ10ZIP.

According to the results associated with the ZIP regression approach, it is worth noticing that, as for the nonzero related model, FRAcc, IProfile, IAlign, and Leng are not significant (<1.96). Moreover, according to the sensitive degrees of other significant parameters with the exponential form (rank indicated in brackets), the LX-accident-prone region factor is still the most sensitive contributor among these parameters. While as for the zero-inflation model, only the Wid, RSL, and FReg are significant (>1.96).

ZINB regression:

When applying the ZINB regression, the general form of λ10ZINB is given by ej=1mβ0+βjxj+ε, and it still requires to be expressed by Eq. (21). The values of dispersion parameter α for nonzero observations and zero-inflation observations are estimated at 3.8102 and 1.4069, respectively, in our study through the iterative estimation algorithm automatically. The estimated results of the ZINB regression are shown in Table 8 (for nonzero observations) and Table 9 (for zero-inflation observations). According to the results associated with the ZINB regression approach, it is worth noticing that, as for the nonzero related model, CM, IAlign, and Wid are significant (>1.96). One can also notice that according to the sensitive degrees of the three parameters (rank indicated in brackets), the LX width is the most sensitive contributor among them. While as for the zero-inflation model, only the FRAcc and CM are significant (>1.96).

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K1−7.1280.734−9.709×
FRAccCF0.6710.4131.624
CMCCM4.486e-03 (3)4.991e-048.990×
IProfileCProfile−5.886e-020.144−0.406
IAlignCAlign0.371 (1)8.274e-024.495×
WidCWid0.145 (2)4.558e-023.175×
LengCLeng3.219e-031.203e-020.268
RSLCRSL2.558e-031.954e-031.309
FRegCReg0.7950.4461.783

Table 8.

Count model regression results of λ10ZINB.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K1−4.0362.190−6.709×
FRAccCF0.260 (1)1.4562.179×
CMCCM6.685e-02 (2)1.838e-023.636×
IProfileCProfile0.7050.5441.296
IAlignCAlign0.5350.3281.632
WidCWid8.873e-020.1800.491
LengCLeng0.1146.639e-021.725
RSLCRSL5456e-036.629e-030.823
FRegCReg1.6321.6790.972

Table 9.

Zero-inflation model regression results of λ10ZINB.

Advertisement

3. Model performance evaluation and discussion

In this section, we will assess the performance of our prediction models while determining an appropriate statistical distribution to be combined with the models, in such a way as to ensure the most accurate estimation of the probability of accidents occurring at a given SAL2 in a given year. The Bayesian information criterion (BIC) [29], Akaike’s information criterion (AIC) [30], the Pearson chi-square statistic (PCS) test [31], and the degree of freedom (DF) are used to evaluate the goodness of fit (GOF) of the model. They can be respectively expressed as follows:

BIC=n+n×ln2π+n×lnRSS/n+l+1lnn,E22
AIC=n+n×ln2π+n×lnRSS/n+2l+1,E23
PCS=i=1nOiλi2λi,E24
DF=nl+1,E25

where RSS is the sum of the squares of residuals between the annual accident frequencies observed and the annual accident frequencies estimated, n is the sample size, l is the number of independent exponential parameters, λi is the annual accident frequency expected, and Oi is the annual accident frequency observed.

The BIC and AIC are used to test the relative quality of models for a given dataset. Smaller BIC and AIC values indicate a better model fitting. The PCS test is used to determine if there is a significant difference between the values expected and the values observed. The PCS is roughly equal to DF if the model fits the data perfectly without any dispersion. Namely, the closer the PCS is to the DF, the better the model fits the data [14].

The log-likelihood statistic test (LL) is adopted to assess the GOF of the accident frequency prediction model combined with a statistical distribution. The larger the LL, the more preferred the model [14]. The mathematical expression of the LL is given as follows:

LL=i=1nlnP̂i,E26

where n is the sample size and P̂i is the estimated probability of accident frequency observed. P̂i is computed respectively according to the accident frequency prediction model combined with the Poisson or the NB distribution.

3.1 Model performance comparison among variants of λ10Y

The results of AIC, BIC, and PCS statistical tests are shown in Table 10 with the goodness ranked in brackets. The following findings are obtained: 1) considering AIC and BIC, the λ10Y model gives better results, since the AIC and BIC values corresponding to the λ10Y model are much smaller than those for the λ10Poi, λ10NB, λ10ZIP, and λ10ZINB models; 2) in terms of PCS test, the λ10Y model is also the most effective one, since the PCS of λ10Y model is closer to DF (DFs of λ10Y, λ10Poi, λ10NB, λ10ZIP, and λ10ZINB are considerably approximative).

TestPOI-λ10YNB-λ10Yλ10Poiλ10NBλ10ZIPλ10ZINB
AIC−190,744 (1)−190,744 (1)−187,804 (5)−189,942 (2)−188,312 (4)−189,826 (3)
BIC−190,670 (1)−190,670 (1)−187,720 (5)−189,858 (3)−188,176 (4)−189,935 (2)
PCS65,796 (1)65,796 (1)125,495 (5)123,715 (4)118,185 (3)110,496 (2)
DF83,31383,31383,31183,31183,31183,311
LL−2599 (2)−2596 (1)−2732 (6)−2711 (5)−2701 (4)−2631 (3)
Goodness score
(the lower, the better)5421141510

Table 10.

Model GOF comparison among variants of λ10Y.

LL test results are shown in Table 10. One can notice that, for the λ10Y model combined with either the Poisson or NB distribution, its GOFs are significantly better than λ10Poi and λ10NB models’ GOFs according to the LL test. Furthermore, the GOF of λ10Y combined with the NB distribution (NB-λ10Y) is better than when combined with the Poisson distribution (POI-λ10Y).

3.2 A comparison between λ10Y and two existing reference models

In this section, we compare the present model λ10Y with other two models which are widely used in existing related works. As mentioned in Section 1, the first widely used model is given in Eq. (1) [13, 14, 18]. In our study, this model can be specified as follows:

λTV=exp(K2+CV×V+CT×T+CF×FRAcc+CProfile×IProfile+CAlign×IAlign+CWid×Wid+CLeng×Leng+CRSL×RSL+CReg×FReg),E27

where the average daily road traffic V and the average daily railway traffic T are applied separately in exponential form.

The second model as shown in Eq. (2) (e.g., [17, 32]) is specified as Eq. (28) in our study:

λMon=exp(K3+CM×lnV×T+CF×FRAcc+CProfile×IProfile+CAlign×IAlign+CWid×Wid+CLeng×Leng+CRSL×RSL+CReg×FReg),E28

where the conventional traffic moment V×T is applied.

It should be noted that the ZIP and ZINB models were also investigated for λTV and λMon but resulted in no higher goodness-of-fit values and a quite small number of significant parameters compared with the Poisson and NB models and, hence, were not reported in this section. The Poisson and NB regression results of the λTV and λMon are shown in Tables 1114, respectively. One can notice that the impacts of road profile and road accident are still not significant in the λTV and λMon. The AIC, BIC, PCS, and LL tests and observed/estimated accident frequency comparison are given in Table 15. According to the quality test results discussed in Section 3.1, the λ10Y combined with the NB distribution (NB-λ10Y) shows the best prediction performance among the four investigated combinations. Therefore, we will only compare the NB-λ10Y with the λTV and λMon combined with the Poisson and NB distributions, respectively, in the following content.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K2−9.8070.413−22.223×
VCV1.098e-04 (7)1.613e-056.811×
TCT8.777e-03 (6)1.115e-037.869×
FRAccCF0.6360.3331.913
IProfileCProfile−1.445e-011.209e-01−1.195
IAlignCAlign3.319e-01 (2)6.747e-024.919×
WidCWid2.059e-01 (3)2.483e-028.292×
LengCLeng3.952e-02 (4)7.868e-035.024×
RSLCRSL1.154e-02 (5)1.487e-037.759×
FRegCReg1.750 (1)3.463e-015.053×

Table 11.

Poisson regression results of λTV.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K2−9.8824.531e-01−21.810×
VCV1.155e-04 (7)1.683e-056.861×
TCT9.152e-03 (6)1.234e-037.416×
FRAccCF0.6073.402e-011.784
IProfileCProfile−1.532e-011.243e-01−1.232
IAlignCAlign3.240e-01 (2)6.988e-024.636×
WidCWid2.212e-01 (3)2.579e-028.575×
LengCLeng3.895e-02 (4)8.415e-034.629×
RSLCRSL1.160e-02 (5)1.529e-037.589×
FRegCReg1.739 (1)3.575e-014.864×

Table 12.

NB regression results of λTV.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K2−11.8164.540e-01−26.023×
lnV×TCM4.036e-01 (2)2.776e-0214.538×
FRAccCF6.359e-013.325e-011.913
IProfileCProfile−6.279e-021.205e-01−0.521
IAlignCAlign2.875e-01 (3)6.799e-024.228×
WidCWid1.185e-01 (4)3.296e-023.596×
LengCLeng2.213e-02 (5)9.530e-032.322×
RSLCRSL8.811e-03 (6)1.350e-036.527×
FRegCReg1.446 (1)3.358e-014.307×

Table 13.

Poisson regression results of λMon.

ParameterCoefficientEstimated valueStandard errort-statisticSignificant
K2−11.8504.628e-01−26.603×
lnV×TCM4.034e-01 (2)2.822e-0214.297×
FRAccCF6.368e-013.382e-011.883
IProfileCProfile−7.103e-021.230e-01−0.578
IAlignCAlign2.848e-01 (3)6.960e-024.092×
WidCWid1.214e-01 (4)3.361e-023.612×
LengCLeng2.204e-02 (5)9.752e-032.260×
RSLCRSL8.892e-03 (6)1.368e-036.500×
FRegCReg1.480 (1)3.428e-014.316×

Table 14.

NB regression results of λMon.

TestNB-λ10YPOI-λTVNB-λTVPOI-λMonNB-λMon
AIC−190,744 (1)−177,914 (5)−179,842 (4)−183,714 (3)−186,532 (2)
BIC−190,670 (1)−177,610 (5)−179,738 (4)−183,587 (3)−186,191 (2)
PCS65,796 (1)121,715 (5)119,133 (4)118,511 (3)115,634 (2)
DF83,31383,31083,31083,31183,311
LL−2596 (1)−2722 (5)−2703 (3)−2705 (4)−2683 (2)
Goodness score
(the lower, the better)42015138

Table 15.

Model GOF comparison among λ10Y, λTV, and λMon.

As shown in Table 15, the AIC, BIC, and PCS results related to the λ10Y model are better than those for the λTV and λMon models. Moreover, in terms of the LL test, the NB-λ10Y is still the most preferred one.

Advertisement

4. Conclusions

Based on our study, some remarks need to be highlighted as follows:

  1. The corrected traffic moment proposed is more effective in estimating automobile-involved LX accidents frequency compared with the conventional traffic moment, single average daily railway traffic or single average daily road traffic. It is worth mentioning that the average daily railway traffic with a power of 0.646 has a more decisive impact on the LX accident frequency than the average daily road traffic with a power of 0.354. Moreover, the higher the combined exposure of railway and roadway traffic, the higher the likelihood of an accident occurring.

  2. According to the analyses above, the form of λ10Y highlights the impact of road accident factor FRAcc, while the impact of FRAcc is neglected in λ10Poi, λ10NB, λTV, and λMon models (see Tables 4,5,1114). The impact of road accidents on the risk level was likely to be ignored in the previous studies related to LX safety analysis.

  3. We originally introduce the region LX-accident-prone factor (see Table 2) in this study to interpret the variation of LX accident statistics with regard to various regions. According to the sensitive degrees of variables ranked in Table 3, among the LX characteristics, the risk of LX accidents is most sensitive to the region LX-accident-prone factor. However, in many past studies, the impact of LX local region is neglected. In fact, the regional accident history varies from one region to another, which correspondingly has varying degrees of impact on the LX accident frequency in different regions.

To sum up, the develop model λ10Y has trustworthy goodness of fit. Moreover, it shows relatively high prediction accuracy for LX accident frequency prediction when combined with the NB distribution.

References

  1. 1. Read GJM, Salmon PM, Lenné MG, Stanton NA. Walking the line: Understanding pedestrian behaviour and risk at rail level crossings with cognitive work analysis. Applied Ergonomics. 2016;53:209-227
  2. 2. Ghazel M. Using stochastic petri nets for level-crossing collision risk assessment. IEEE Transactions on Intelligent Transportation Systems. 2009;10(4):668-677
  3. 3. Liu B, Ghazel M, Toguyéni A. Model-based diagnosis of multi-track level crossing plants. IEEE Transactions on Intelligent Transportation Systems. 2016;17(2):546-556
  4. 4. ERA. Railway safety performance in the European Union. 9(2) Agency Regulation 881/2004/EC. 2014
  5. 5. Ghazel M, El-Koursi EM. Two-half-barrier level crossings versus four-half-barrier level crossings: A comparative risk analysis study. IEEE Transactions on Intelligent Transportation Systems. 2014;15(3):1123-1133
  6. 6. Silmon J, Roberts C. Using functional analysis to determine the requirements for changes to critical systems: Railway level crossing case study. Reliability Engineering and System Safety. 2010;95(3):216-225
  7. 7. Australian Transport Safety Bureau. Australian Rail Safety Occurrence Data: 1 July 2002 to 30 June 2012 (ATSB Transport Safety Report RR-2012-010). Canberra, Australia: ATSB; 2012
  8. 8. SNCF Réseau. World Conference of Road Safety at Level Crossings. 2011. Available from: http://www.planetoscope.com/automobile/1271-nombre-de-collisions-aux- passages-a-niveau-en-france.html
  9. 9. Plesse G. Des détecteurs d‘obstacles déployés aux passages à niveau. 2017. Available from: http://www.leparisien.fr/info-paris-ile-de-france-oise/transports/des-detecteurs-d-obstacles-deployes-aux-passages-a-niveau-02-06-2017-7011714.php
  10. 10. Liang C, Ghazel M, Cazier O, El-Koursi EM. Analyzing risky behavior of motorists during the closure cycle of railway level crossings. Safety Science. 2018;110:115-126
  11. 11. Lord D, Mannering F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice. 2010;44(5):291-305
  12. 12. Guikema SD, Quiring SM. Hybrid data mining-regression for infrastructure risk assessment based on zero-inflated data. Reliability Engineering and System Safety. 2012;99:178-182
  13. 13. Oh J, Washington SP, Nam D. Accident prediction model for railway-highway interfaces. Accident; Analysis and Prevention. 2006;38(2):346-356
  14. 14. Lu P, Tolliver D. Accident Analysis & Prevention. Accident prediction model for public highway-rail grade crossings. 2016;90:3-81
  15. 15. Medina JC, Benekohal RF. Macroscopic models for accident prediction at railroad grade crossings: Comparisons with US Department of Transportation accident prediction formula. Transportation Research Record: Journal of the Transportation Research Board. 2015;2476:85-93
  16. 16. Chadwick SG, Zhou N, Saat MR. Highway-rail grade crossing safety challenges for shared operations of high-speed passenger and heavy freight rail in the US. Safety Science. 2014;68:128-137
  17. 17. Miranda-Moreno L, Fu L, Saccomanno FF, Labbe A. Alternative risk models for ranking locations for safety improvement. Transportation Research Record: Journal of the Transportation Research Board. 2005;1908:1-8
  18. 18. Austin RD, Carson JL. An alternative accident prediction model for highway-rail interfaces. Accident; Analysis and Prevention. 2002;34(1):31-42
  19. 19. Liang C, Ghazel M, Cazier O, El-Koursi EM. Risk analysis on level crossings using a causal Bayesian network based approach. Transportation Research Procedia. 2017;25:2172-2186
  20. 20. Liang C, Ghazel M, Cazier O, El-Koursi EM. Developing accident prediction model for railway level crossings. Safety Science. 2018;101:48-59
  21. 21. SNCF Réseau. SNCF. Statistical Analysis of Accidents at LXs. France: SNCF Réseau; 2010
  22. 22. Madsen K, Nielsen HB, Tingleff O. Methods for non-linear least squares problems. In: Informatics and Mathematical Modelling. 2nd ed. Denmark: Technical University of Denmark; 2004
  23. 23. Chang LY. Analysis of freeway accident frequencies: Negative binomial regression versus artificial neural network. Safety Science. 2005;43(8):541-557
  24. 24. Buddhavarapu P, Scott JG, Prozzi JA. Modeling unobserved heterogeneity using finite mixture random parameters for spatially correlated discrete count data. Transportation Research Part B: Methodological. 2016;91:492-510
  25. 25. Utkin LV, Coolen FPA, Gurov SV. Imprecise inference for warranty contract analysis. Reliability Engineering and System Safety. 2015;138:31-39
  26. 26. Miaou SP. The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accident; Analysis and Prevention. 1994;26(4):471-482
  27. 27. Ridout M, Hinde J, DeméAtrio C. A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives. Biometrics. 2001;57(1):219-223
  28. 28. Dai H, Bao Y, Bao M. Maximum likelihood estimate for the dispersion parameter of the negative binomial distribution. Statistics & Probability Letters. 2013;83(1):21-27
  29. 29. Weakliem DL. A critique of the Bayesian information criterion for model selection. Sociological Methods & Research. 1999;27(3):359-397
  30. 30. Bozdogan H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika. 1987;52(3):345-370
  31. 31. Pearson KX. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1900;50(302):157-175
  32. 32. Saccomanno FF, Fu L, Miranda-Moreno L. Risk-based model for identifying highway-rail grade crossing blackspots. Transportation Research Record: Journal of the Transportation Research Board. 2004;1862:127-135

Written By

Ci Liang and Mohamed Ghazel

Reviewed: 05 January 2023 Published: 05 March 2023