Fuzzy Clustering Approach for Accident Black Spot Centers Determination

Traffic accident rates of Turkey are higher than most of the European Union countries and other countries in the world (Table 1). Every year, almost more than 8000 people die by traffic accidents in Turkey. This figure is very high comparing to many countries at same size. There have been many projects conducted by national or international organizations for decreasing these rates. In order to develop sustainable prevention models, accidents should be analyzed in detail considering primary and secondary reasons.


Introduction
Traffic accident rates of Turkey are higher than most of the European Union countries and other countries in the world (Table 1).Every year, almost more than 8000 people die by traffic accidents in Turkey.This figure is very high comparing to many countries at same size.There have been many projects conducted by national or international organizations for decreasing these rates.In order to develop sustainable prevention models, accidents should be analyzed in detail considering primary and secondary reasons.
Traffic accident data can be analyzed in different ways, based on amount and types of data.The analysis is not complicated if the data are smooth and not dispersed.But it is not an easy task if the data are scattered.Although there is not a general definition for black spots, locations where at least more than one accident occured are treated as black spots.Based on this definition, the number of black spots can be increased and analysis of them is getting more difficult.

Country
Number  (Murat and Sekerler, 2009) Several methods can be used for determination of black spots and centers.It can be determined by eye using simple observations.But this simple approach can include subjective perceptions and also results obtained can not be sensitive and scientific.Besides to locations, other specifications of black spots should be taken into consideration for a scientific analysis.Developing countermeasures and classifying by characteristics for black spots that are intensified and covered whole area on the map is not an easy task.Although www.intechopen.comFuzzy Logic -Emerging Technologies and Applications 84 some black spots can have common characteristics, they can be located far away from each other.On the other hand, characteristics of black spots that are closely located to each other can be different.Therefore definition and analysis of black spots include uncertainties and conventional approaches can not be used for this purpose.In this study, cluster analysis approach is used for determination of black spots center and definition of the centers.Two types of analysis as k-means and fuzzy c-means are used.The (hard) k-means clustering approach is used as conventional method.In k-means clustering, the boundaries of clusters are determined as crisp.Thus some black spots that belong to a cluster based on their characteristics can be defined in different cluster by k-means clustering approach.But it can be defined in both clusters.To remove this deficiency, fuzzy c-means clustering approach is used.Fuzzy c-means are used for representing uncertainties in belonging to clusters.Thus some black spots are treated as members of two centers with two membership (belonging) values.In addition to determination of black spots centers, associative factors about the black spots are analyzed and discussed in the research.
In this study, first the black spots are determined using the data provided from Local Police Department, after that, the centers where black spots are intensified are revealed.These centers and the black spots around are considered in detail, the reasons of common results and findings are searched.

Literature review
There have been numerous studies about traffic safety.In this study, only GIS based studies and some important researches that include cluster analysis are taken into account and summarized.
Cluster analysis has been used in modeling traffic accident data by many researchers.Wong et al.(2004) used cluster analysis to develop a qualitative assessment methodology.Yannis et al. (2007) proposed a multilevel negative binomial modeling approach for the regional effect of enforcement on road accidents at Greece using cluster analysis.They made geographical and mathematical cluster analysis and reported that alcohol enforcement is the most significant one among the various types of enforcement.
Abdel- Aty and Radwan (2000), used Negative Binomial Distribution for modeling traffic accident occurrence and involvement.The models indicated that young and older drivers experience more accidents than middle aged drivers in heavy traffic volume, and reduced shoulder and median widths.They also obtained that heavy traffic volume, speeding, narrow lane width, larger number of lanes, urban roadway sections, narrow shoulder width and reduced median width increase the likelihood for accident involvement.Ng et al. (2002) aimed at developing an algorithm to estimate the number of traffic accidents and assess the risk of traffic accidents in Hong Kong.They presented an algorithm that involves a combination of mapping technique (Geographical Information System (GIS) techniques) and statistical methods.The results showed that the algorithm improves accident risk estimation when comparing to the estimated risk based on only the historical accident records.Abdel-Aty (2003) analyzed driver injury severity using ordered probit modeling approach.The models consider showed the significance of driver's age, gender, and seat belt use, point of impact, speed, and vehicle type on the injury severity level.An estimation model for collision risks of motor vehicles and bicycles is developed by Wang and Nihan (2004).They classified the accidents considering movements of traffic flows such as through, right turning, and left turning.A probability based method is used in the research.Three negative binomial regression models are improved in the study.The study showed that Negative Binomial regression approach can be used instead of Poisson regression approach.Abdel-Aty and Pange (2007) investigated crash data in two levels (i.e.collective and individual level).They focused real time estimation of crash likelihood and discussed advantages and disadvantages of the analysis in two levels.Saplıoğlu and Karaşahin (2006) examined traffic accidents of Isparta, Turkey using Geographical Information Systems.They determined black spots and found that there is an increase in number of black spots by the years.They also emphasized that most of the accidents are occurred in junctions.Another remarkable study has been accomplished in Singapore.Kamalasudhan et al. (2000) obtained accident density map using digital accident data.They searched black spots or hot spots using the accident density map.The types of accidents by the days, hours, pavement conditions and vehicle types are analyzed in the study.
Individual analysis of traffic accidents has been taken into consideration in most of the studies given in literature.Besides, analysis of density (i.e.densely recorded area) and determination of black spots are also very important for traffic safety researches.On the other hand, determination of black spots and their center is not an easy task and need to be made many trials.To handle this problem, cluster analysis approach is used in the research.
The main objectives of this paper are determination of black spots' center by k-means and fuzzy clustering approaches and analysis of these points to reveal main reasons.

Cluster analysis approaches
In recent years, cluster analysis has been widely used in the engineering application such as civil engineering, target recognition, medical diagnosis etc. Cluster analysis is an unsupervised method for classifying data, i.e. to divide a given data into a set of classes or clusters.

Conventional (K-Means) clustering
K-means clustering approach is one of the popular methods used in industrial and scientific areas.Euclidian distance is used in k-means clustering algorithm.In this analysis, the desired number of clusters should be determined in the beginning.The following objective function is tried to be minimized in this approach.

()
11 where, is the distance between the data () j i x and the corresponding center ( j c ).
J is total distance.
The following steps are used in k-means clustering approach.

www.intechopen.com
Fuzzy Logic -Emerging Technologies and Applications

86
 Select the number of cluster centers (k)  Assign each object to the nearest cluster center group  After assigning all objects, recalculate locations of cluster center  Repeat 2 nd and 3 rd steps till the cluster centers are fixed One of the main disadvantages of this approach is determination of number of clusters in the beginning.Another disadvantage is sensitivity of the algorithm to the outliers.

Fuzzy C-Means clustering
Fuzzy C-Means (FCM) clustering algorithm has been widely used and applied in different areas.The description of the original fuzzy clustering algorithm based on objective function dates back to 1973 (Bezdek, 1973;Dunn, 1974).This algorithm was conceived in 1973 by Dunn (1974) and further generalized by Bezdek (1973).Among the existing fuzzy clustering methods, the Fuzzy c-means (FCM) algorithm proposed by Bezdek (1981) is the simplest and is the most popular technique of clustering.It is an extension of the hard K-means algorithm to fuzzy framework.Grubesic (2006) explored the use of a generalized partitioning method known as fuzzy clustering for crime hot-spot detection.
FCM algorithm is extension of Hard K-means with advantage of fuzzy set theory and contrary to the K-means method the FCM is more flexible because it shows those objects that have some interface with more than one cluster in the partition.In traditional clustering algorithms such as Hard K-Means, an element belongs fully to a cluster or not (i.e.0 or 1).
On the other hand, in Fuzzy clustering, each element can belongs to several clusters with different membership degrees.The main goal of any clustering algorithm is to determine appropriate the partition matrix U(X) of a given data set X consisting of patterns ( ) and to find the appropriate number of clusters.The objective function and constraints can be defined as; Objective function Where, c is the number of cluster, i v is the centroid, d is the Euclidian distance between rescaled feature vector and centroid of cluster, ik u [0,1] denotes the degree of membership function of feature vector, m[1 ∞] is weight exponent for each fuzzy membership and it determines the fuzziness of the clusters and controls the extent of membership shared among the fuzzy clusters.U, which is given in equation ( 7), is the fuzzy partition matrix which contains the membership of each feature vector in each fuzzy cluster.It should be noted that, sum of membership values for a cluster must be equal to 1.
2 (,)(,) (,) The procedure of FCM based on iterative optimization (Bezdek, 1981) can be given as; i. Initialize fuzzy partition matrix U or Fuzzy cluster centroid matrix V using a random number generator.ii.If the FCM algorithm is initialized with fuzzy partition matrix, the initial memberships belonging to cluster is adjusted using equation ( 8).
iii.If the FCM algorithm is initialized with fuzzy cluster centroid matrix containing the fuzzy cluster centroid, memberships belonging to cluster is determined using equation (9).iv.i v fuzzy centroid is computed by equation ( 9), v. The fuzzy membership ( ik u ) is updated by equation ( 10), The steps (iii) and (iv) are repeated until the change in the value of memberships between two iterations is sufficiently small level.

Validation
The main problem in fuzzy clustering is that the number of clusters (c) must be specified beforehand.Selections of a different number of initial clusters result in different clustering partitions.Therefore, it is necessary to validate each of fuzzy partitions after the cluster analysis.Cluster validity refers to the problem whether a given fuzzy partition fits to the data all.The clustering algorithm always tries to find the best fit for a fixed number of clusters and the parameterized cluster shapes.However this does not mean that even the best fit is meaningful at all.Either the number of clusters might be wrong or the cluster shapes might not correspond to the groups in the data, if the data can be grouped in a meaningful way at all.In this study, several clustering indexes were used and tested for different values of both cluster number (c) and to examine their adequacy in analyzing of traffic accidents.These indexes are Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB) and Dunn's Index (DI).
Partition Coefficient (PC) measures the amount of "overlapping" between two Fuzzy clusters (Bezdek, 1981).The disadvantage of this index is lack of direct connection to properties of the data.The optimal number of cluster is at the maximum value and the range of this index is Classification Entropy (CE) measures the fuzziness of the cluster partition.The range of CE is [0, log ()  ac ] and optimal number of cluster is at minimum value.Partition Index (SC) is the ratio of the sum compactness and separation of the clusters.It is a sum of individual cluster validity measures normalized through division by the fuzzy cardinality of each cluster.Comparing different partitions having equal of clusters, SC is useful index and a lower value of this index demonstrates a better partition.
Separation Index (S) uses a minimum distance separation for partition validity.
Xie and Beni's Index (XB) aims to quantify the ratio of the total variation within clusters and separation of clusters.The optimal value of cluster is at minimum value of this index.
Dunn's Index (DI) is proposed to use the identification of compactness and separated cluster.

Study area and available data
In this study, Denizli that is a medium sized city (current population is about 700000) of Turkey is considered.Traffic accident records for the years of 2004, 2005 and 2006  All of these data given above are recorded using first MS Excel.Then, coordinates of each accident point are determined using street definition system in MAPINFO software.The data from Excel data base including coordinates of the accident points are transferred to MAPINFO data base.This data base is constituted for accident analysis by inquiring different attribution of each accident.Thus, traffic accident can be evaluated from different points of view and the relations about reasons and results of accidents can be revealed by these analysis.
The data related to coordinates of accidents (locations) are used in cluster analysis.The Figure 1 shows sample processed data on Denizli city GIS map (Murat et al, 2008).The data are analyzed using k-means and fuzzy clustering approaches.

Analysis of traffic accidents
In clustering analysis, the observed scales of the variables must be transformed so that their ranges are comparable because the clustering methods are sensitive to scale differences.Therefore, the variables were rescaled between 0 and 1 using equation ( 18). Latitude Where Y the feature at site is, min max Xa n d X are the maximum and minimum of the feature within the data set.These rescaled characteristics were employed as the basis for classifying the traffic accidents.To determine the optimum the cluster number, sensitivity of the results from FCM algorithm to variation in the value of cluster numbers (c) is varied from 2 to 9 with increment of 1.The variations of objective function of Fuzzy C-means algorithm for Economically Damaged (ED) and ED+ injured accidents with change in the number of cluster ranging from 2 to 11 are shown in Figure 2. On the other hand, for the Dead and Injured accidents, Figure 3 shows the variations of objective function of FCM algorithm with change in the number of cluster ranging from 2 to 5.
It is seen in Figure 2 and 3 that the values of objective functions of FCM algorithm, in generally, decrease with increase in the number cluster.The optimal number of clusters in the data set is identified by using objective function and fuzzy cluster validation indexes.The variations of cluster indexes for Economically Damaged accidents given in section 2 with change in the number of cluster were calculated and given in Figure 4.The main drawback of PC is the monotonic decreasing with c and the lack of direct connection to the data.CE has the same problems: monotonic increasing with c and hardly detectable connection to the data structure.It is seen in Figure 4 that SC and S decrease with increase in the number of cluster.On the other hand, SC, S and XB indexes reaches the optimal value of number of cluster c = 10 and 11.For Economically Damaged(ED) traffic accident analysis, eleven clusters were chosen as the optimal number of cluster according to optimal values of objective function and validity indexes given in Figure 3 and 4. Table 2 exhibits the coordinates of cluster centers for ED and injured type accidents obtained from fuzzy cluster analysis (Murat and Sekerler, 2009).Similar procedure was carried out for ED+ Injured and Dead and Injured type accidents.Figure 5 shows the location of the corresponding fuzzy clusters for traffic accidents in Denizli city.
The data are also analyzed by conventional K-means clustering approach.In this analysis, seven clusters are obtained.Figure 6 depicts results of k-means clustering approach.Table 3 shows the coordinates of cluster centers and the number of accidents for the clusters determined by conventional k-means clustering analysis.It can be said that analysis of the fuzzy clustering approach has more details than the conventional k-means clustering approach.

Coordinates
www.intechopen.comAs seen on Figure 6, the centers obtained are similar to that obtained by fuzzy clustering approach.But fuzzy clustering approach provided four more clusters comparing to conventional k-means clustering approach.One of the important cluster (center named Ucgen) that has the biggest number of accident is not defined by k-means clustering approach.But it is defined by fuzzy c-means clustering approach.
Total numbers of accidents are defined in both clustering analysis approaches.But the distributions of accidents are different for k-means and fuzzy c-means clustering approaches.This is come from the number of clusters and difference about the analysis.
Fig. 6.Clusters obtained by K-means clustering approach.

Results and discussion
Using cluster analysis, different types of traffic accidents are analyzed and three types of clusters are carried out as four, seven and eleven clusters respectively.Following table shows the common points that can be considered as black spots for three types of clusters.These points are also determined as the center of each cluster.As seen on Table 3, the number of accidents certified the black spots centers determined.
Location of black spots has an importance in analysis.It should be considered as urban sections and rural areas.But it is difficult to determine a strict line for clustering accidents.Some accidents location can be defined in more than one region or cluster.Therefore, fuzzy clustering approach is preferred.The results show that, Fuzzy clustering approach provided four more black spot centers comparing to k-means clustering approach.Three of them are located in urban areas and one of them is in rural areas.
The black spots that are determined by cluster analysis are examined in detail regarding types of accident occurrence.The geometric and physical conditions of black spots are also examined in detail and the results obtained are summarized as follow.The first black spot, Ucgen Intersection, is one of the most important intersections in Denizli city center.The main arterials from Ankara, İzmir and Antalya cities are connected at this point.Average Annual Daily Traffic volume of this intersection is very high and congested traffic conditions are seen in peak hours.The Antalya road arm has some problems occurred by geometric design.There is not enough weaving area from the previous intersection for the vehicles that wants to change the lanes in this arm.Most of accidents are occurred because of this fault.Another important problem is related to pedestrian crossings.The pedestrians do not obey the rules while crossing the street.
Karayolları Intersection is determined as the second black spot.The main problem of this intersection is also related to geometric design.This intersection has five approaches.Therefore optimizing signal timing is very hard.Main reasons of accidents in this intersection are related to high speed and sight distance.The speed reduction and monitoring systems can reduce number of accidents and accident severity.
Cınar is another important intersection of the city.This intersection is served most of the daily traffic circulating in the city.There are some problems about optimizing signal timings.On the other hand, some drivers ride as aggressive drivers and cause traffic accidents.There is serious pedestrian traffic in this area and traffic signal is designed as filter for vehicles and pedestrians.Traffic safety problems are seen, because of the design and can be solved by re-calculating the intersection signal timings and changing phase plans.
The main problems in Kiremitci intersection is related to intersection geometry and aggressive drivers.The problem can be solved by re-designing and optimizing signal timings.
One of the different black spot is determined as 25.Cadde.This area is an industrial area that has many roads and intersections.The traffic composition includes heavy and light vehicle traffic flows are seen in this area.Most of the accidents are seen at intersections and all of them are controlled by isolated systems.One of the serious problems is related to design of intersections especially some uncontrolled intersections.
One of the main advantages of fuzzy cluster analysis is to handle the problem in an easiest and practical way.The conventional approaches take very long time and require many trials for the analysis.
Actually, most of reasons of problems occurred in all of these black spots are similar.These are high traffic density, non-optimized signal timing, inconvenient pedestrian crossings, and aggressive driving (caused by non-optimized control), geometric design faults (for weaving areas, taper design etc), inconvenient lane use and lane changing, excessive speed etc. Solutions for these problems can be achieved by developing some policies and intervene the system considering international standards for design.But sustainable solution can be found by generalizing and improving "traffic concept" in the society.
The fuzzy clustering approach presented in this study can be used in determination of black spots instead of conventional approaches and traffic safety policies can be developed considering detailed analysis of these black spots.The fuzzy clustering approach can also be used for analyzing other related effective factors on traffic accidents and very interesting results may be obtained based on these analysis.

Conclusion
The findings of this research can be used for investigation of different effects on traffic accident occurrence considering characteristics of black spots and centers.Safety levels of black spots can be determined by this way.Some countermeasures can be developed using safety levels of black spots.On the other hand, the risk analysis can be made in detail in these centers and priorities of investment planning can be defined considering level of safety risk.The fuzzy clustering analysis can be extended considering other characteristics (such as accident type, occurrence type etc.) of black spots.Therefore, definition of black spots can be reviewed after these analyses.

Acknowledgement
This research was supported by The Scientific and Technical Research Council of Turkey (TUBITAK) under project number 105G090.Part of the analysis presented is made by Mr. Alper Şekerler during preparation of his MSc.dissertation.These supports are appreciated.

Fig. 2 .Fig. 3 .FuzzyFig. 4 .
Fig. 2. Variation value of objective function of FCM with change in the number of clusters for ED and ED+Injured type accidents.

FuzzyFig. 5 .
Fig. 5.The location of the corresponding fuzzy clusters for traffic accidents in Denizli city (a) for 11 clusters, (b) for 4 clusters.

Table 1 .
Traffic Accident rates are used in analyzing accidents.The accident reports are provided by Local Police Department.All of the data and documents are taken from an ongoing research project.Following information are collected from the reports:

Table 2 .
Coordinates of Cluster centers given by Fuzzy C-means clustering analysis

Table 3 .
Coordinates of Cluster centers given by conventional k-means analysis

Table 3 .
The common black spots determined by cluster analysis