Open access peer-reviewed chapter - ONLINE FIRST

Procedure to Prepare and Model Speed Data Considering the Traffic Infrastructure, as Part of a Cyber-Physical System

By José Gerardo Carrillo-González, Jacobo Sandoval-Gutiérrez and Francisco Pérez-Martínez

Submitted: May 21st 2019Reviewed: June 27th 2019Published: July 26th 2019

DOI: 10.5772/intechopen.88280

Downloaded: 62

Abstract

This chapter investigates the relationship between traffic control infrastructure (traffic lights and speed bumps) and the vehicles’ travel speeds, for certain hours and days of the week. The authors propose the following procedures: (1) street segmentation, (2) clustering and categorization of speed data, (3) histograms’ comparison analysis, (4) outlier detection, (5) modeling, and (6) delivering info to the users. Comparing speed histograms, segments with matching infrastructure presented similarities, regardless of the day of the week. Two techniques to model data were employed: polynomial regression and multinomial logistic regression. The algorithms to predict the travel speed category were also developed. The first technique yields on average 91.3% of data categorized correctly, and the second gets 90.09%. The traffic lights and speed bumps, located on the street segments under consideration, were identified as variables causing different travel speeds. The procedure allows to incorporate more traffic elements and can also be applied to other geographical locations.

Keywords

  • cyber-physical system
  • speed bumps
  • street segments
  • traffic lights
  • travel speed

1. Introduction

Traffic conditions have a profound effect on population’s quality life. The TomTom traffic index states that in 2017, Mexico City had a travel delay of 66% when compared with normal times of uncongested traffic, placing it as the first in the world rank. The wasted time per day was 59 min, or 227 h per year, with delays in the morning and evening peaks of about 100%. Of the 23 million private cars in Mexico, 72% correspond to metropolitan areas [1]. As a result, those areas are a suitable choice to analyze traffic behavior. In 2010, with a population of 20,116,842 and 0.3 cars per habitant (about 6,035,052 cars), the Mexico City Valley is the most crowded of the country. The number of operating vehicles in a city reduces the average traveling speed and increases pollution [2, 3] and the number of car accidents [4, 5]. The zone under study in this work is located between Mexico City and Toluca, a region that is part of the Mexico City megalopolis, which makes the area a suitable candidate for analyzing traffic conditions. In this research we developed a procedure to analyze speed tendencies (by comparing histograms) and prepare (set clusters and remove anomalies) and model speed data to be used in an application example: speed prediction. The procedure answers the following question: what is the pathway to generate new information when speed data is available?

Recently, there has been a great effort in studying and analyzing traffic data from different world locations. Travel speed is one way to measure traffic conditions, as is travel time. In [6], the travel time distribution for different kinds of roads is estimated for Beijing. The time intervals to analyze data were set to 15 min, and it was concluded that the best-fitting distribution depends on the congestion level and that the average travel time of all road segments (for all days) can be estimated with acceptable precision using the normal distribution (compared with the log-normal, gamma and Weibull). In [7], travel time prediction is pursued. The variables considered were flow, concentration, and higher order auto-regression, concluding that local linear regression is preferable than global modeling. Characterization of the daily temporal variation of congestion is presented in [8], where a fitted model and live data are combined in a ten-parameter exponential smoothing equation. With the purpose of analyzing historical traffic data, a query processing method with timeline information is proposed in [9], along with an analysis of the congestion dependency along roads. The work presented in [10] estimates the average link speed with vehicles equipped with GPS, and therefore the quantity of equipped vehicles required for estimating the speed was established.

Using traffic data to make predictions is a current challenge, as Google maps traffic and Waze are doing. The purpose in [11] is to use information from Bing Maps to analyze, visualize, and predict traffic jams in Chicago. In addition, a prediction model to correct flow intensities with logistic regression was proposed, where the independent variables were day, hour, street number, and number of pixels (red, yellow, and green). In this work, a tool was developed to extract the roads’ traffic intensity from a GIS map service, where colors represent flow intensity: red as congested, green not congested, and yellow in between. In [12], the properties of a community-driven mapping service (Waze) are characterized. Additionally, the authors discuss the use of traffic data to identify traffic accidents and potholes. In [13], a four-phase traffic approach is proposed: (1) data collection and representation, (2) traffic prediction, (3) vehicle selection for re-routing, and (4) alternative route assignment. In our work, we focus our contribution in the first two phases.

The traffic infrastructure elements (such as traffic lights, speed bumps, potholes) involved in driving situations influence driver’s behavior, which in turn affects speed and number of accidents. The intention in [14] is the development of statistical models to predict accidents. These models correlate highway characteristics with traffic accidents. The variables considered were classified in groups: section identifiers, cross section related, location, traffic related (e.g., the percentage of trucks on a highway section), alignment, horizontal curvature, and accidents. The regression methods used were Poisson and negative binomial. The statistically significant variables were number of lanes, horizontal curvature, speed limit, tangent length, section length, average annual daily traffic, and peak hour. In addition, accidents are predicted with equations that consider roadway elements such as average daily traffic, commercial and residential units, intersections, speed limits, lane width, and number of lanes.

The work presented in [15] classifies traffic control elements (infrastructure) into three groups according to their effect on accidents. In Group 1 are those elements that reduce the number of accidents, such as speed limit signs, speed-reducing devices, signalized pedestrian crossings, urban play streets, pedestrian streets, traffic-calming areas, traffic signals at intersections, bus lines and bus stops, parking control, and access control. Group 2 has no statistical effect on accidents: road markings, one-way streets, reversible lanes, traffic control for pedestrians and cyclists, priority control, and yield signs at intersections. Group 3 increases accidents: right turn on red, pedestrian crossing without signs, blinking traffic light, and increasing speed limits. According to [16], the presence of traffic control elements with the purpose to reduce speed or simplify the road users’ tasks (e.g., traffic signs) tends to reduce accidents. An obvious consequence of the presence of speed-reducing devices (humps, rumble strips, narrow road width, bollards) is the increase of travel time [17] and the decrease of the average travel speed. One of the conclusions in [15] is that the traffic control elements that reduce accidents also reduce mobility.

Traffic elements such as signals and traffic lights are important in human driving decisions. The work presented in [18] intends to determine the relevance of the static road elements in driving situations using Markov logic networks (MLNs). The information considered to determine the relevance of speed limits and supplementary signs were the position in relation to lanes, vehicle type, date, time, and weather. Then, with first-order logic rules, the relevance of each was inferred. To determine the relevance of traffic lights, the following variables were considered: navigation system, environment perception, spatial relations, and the traffic light state.

The speed changes in the presence of speed bumps were analyzed in [19]. The speed limit on the streets under study is 50 km/h. The speed results measured at the bump location are as follows: about 30% of the cases show an 85th percentile speed higher than the posted limit speed, 26% lie in the range 45–50 m km/h, and the rest is under 45 km/h. The 85th percentile speed (measured after 20–25 m of the bumps’ location, at the crosswalk area) tends to increase in 50% of the tested sites, similar result for the 50th percentile case (45%). Nevertheless, for both cases the speed change was not significant, according to the statistical analysis. Another result was obtained comparing the speed at bumps and 100 m away: in most sites, the 85th percentile speed decreases in the range of 1–18% (with respect to the zone without bumps). The statistical analysis concludes for both percentiles that speed values do not change significantly.

The use of cyber-physical system in traffic is a current topic in the literature. In [20], a simulated vehicular cyber-physical system (VCPS) is designed for delivering warnings to the driver and to avoid accidents. With this end, the predicted vehicle motion/location, the driver behavior and the road geometry were considered. Then, the short-term motion of the objective vehicle and the surrounding vehicles are predicted. With the objective vehicle location and the traveled distance among vehicles, the collision risk is estimated, and the driver is notified. In [21], a perceptual Control Architecture of Cyber-Physical Systems (CPSs) is proposed, taking as example a traffic incident management system. The intelligent behavior of this is characterized by the physical-reflex space and cyber-virtual space. In the physical-reflex space, the sensing actuation of the objective scenario is constructed on four levels of traffic infrastructure. In the cyber-virtual space, the decisions (through Bayesian reasoning network) are defined according to three levels: principles, interrelated factors, and situation assessment. In [22] the potential participation of smartphones (equipped with GPS) is discussed to build a traffic information system (to inform the entire transportation network) that is part of the cyber-physical infrastructure system. In [23] a cloud-based cyber-physical system is presented, with the end to find fast routes for the users. The system is presented in four steps: (1) the GPS on taxis are used as mobile sensors to measure the traffic status in the physical world; (2) the info generated by the taxis is sent to the cloud (cyber world) and mined, and then knowledge is acquired about the taxis’ preferred directions and traffic patterns on the roads; (3) the knowledge in the cloud is sent to the users with the Internet; and (4) the recommendations for a specific user are improved using its driving behavior and preferred routes. In [24], a short-term traffic prediction model (combining fuzzy theory with Markov progress) is presented, which is part of a vehicular cyber-physical system; the prediction results are expressed in terms of traffic flow and speed. A proper discussion about the definition of a cyber-physical system, and its relationship with transportation, is in [25].

From a cyber-physical system point of view, in the procedure presented in this work, the cyber part corresponds to the elements in charge to acquire and mine data for generating knowledge and the process to communicate that Intel to the users. The user (a biological entity) and intelligent devices (e.g., the user smartphone, the vehicle computer) reacting in response of the knowledge correspond to the physical part.

The aim of the present work is to introduce a method for analyzing speed data measured on streets where the traffic infrastructure is assumed to be the cause of low speeds. Then, we develop models and algorithms that, working with our data, allow to make predictions. The procedure presented in this work is summarized in the following steps:

  • Street segmentation is performed considering traffic control elements (speed bumps and traffic lights).

  • Clustering speed data, validated with the silhouette metric.

  • With the Chi-Square distance (χ2), the travel speed histograms of weekdays are compared and also the histograms of segments.

  • Mahalanobis distance is used to detect outliers.

  • Two techniques (polynomial and logistic regression) were used to develop the models that describe speed data. An algorithm for each modeling technique was developed to predict travel speed.

  • Communicate the generated knowledge to the users.

This chapter is organized as follows: Section 1 Introduction; Section 2 Method, which includes theoretical frame (data, clusters, histograms, outliers) and procedure (street segmentation, clustering, comparative analysis of histograms, outlier detection, mathematical models, connecting Intel with users); Section 3 Results (with discussion); and Section 4 Conclusions (with future work).

2. Method

2.1 Theoretical frame

2.1.1 Data

The zone under study is comprised of two streets located in Lerma de Villada, Mexico: Av. Miguel Hidalgo and Av. Reolin Barejon. Data was obtained using the Google Maps Directions API. The time for a vehicle to traverse each segment was recorded every 15 min, after [6]. We found this time interval to be highly efficient for incorporating relevant data while ignoring redundant information. In this way, the average travel speed on each segment was measured. Three weeks (w1,w2, and w3) of data were considered: w1from Dec 27, 2016 to Jan 03, 2017; w2from Jan 03, 2017 to Jan 10, 2017; and w3from Jan 20, 2017 to Jan 27, 2017. The time interval to acquire data was from 6 a.m. to 11:59 p.m. (an interval of 18 h per day) and only in weekdays, i.e., between Monday and Friday.

2.1.2 Clusters

The k-means technique [26] was selected (because it is easy to implement and is commonly used in distinct traffic problems [27, 28, 29]) to cluster the speed data of any of the 3 weeks; since these are close in time, it is expected a similar travel speed from 1 week to another, and then we select w1. In simple terms, the k-means technique consists in calculating the centroid of each cluster as the mean of the data in the corresponding cluster and is recalculated until convergence.

We apply the k-means technique selecting a number of clusters in the range 3–6; for each case we calculate the silhouette score [30], given in Eq. (1), where aiis the average distance from i with the data in the same cluster, biis the minimum average distance from i with the data of each other’s cluster, and i is the data index. The silhouette score is in the range − 1 to +1; a value close to 1 indicates that the speed data is well matched in the selected clusters, while a value close to −1 indicates the opposite situation:

ssi=biaimaxaibiE1

2.1.3 Histograms

Analyzing the speed frequency, by comparing speed histograms of certain locations (special selection) and certain time (temporal selection), we expected to find spatial and temporal relationship about the weekdays when the speed is similar (dissimilar) and the segments where the speed is similar (dissimilar).

The metric employed to compare a pair of histograms is the Chi-Square (χ2) histogram distance [31], given in Eq. (2), where P and Q are the histograms to be compared and Piand Qicontain the speed frequency of the i bin (i is the bin index, the selected bin width is 1):

χ2PQ=12iPiQi2Pi+QiE2

This metric has the advantage of reducing the importance of the result when bins with large count are compared, as in many natural histograms, the difference of bins with high values is less important [31]. If the metric gets a 0 result, then there is no difference between the compared histograms; as the result value becomes larger, the difference in terms of the speed frequency also becomes higher.

2.1.4 Outliers

We filtered the speed data using the Mahalanobis distance (MD) [32] to detect outliers, i.e., atypical speed not belonging to normal driving behavior, since we are not interested in including this data for modeling. The MD is presented in Eq. (3), where xiis a vector containing the time and speed, x¯is a vector with the means, and Cx1is the covariance matrix:

MDi=xix¯TCx1xix¯E3

2.2 Procedure

2.2.1 Street segmentation

The avenues under study were divided into segments: each segment is denoted sk, with k as the segment index. On each segment, we have number of speed bumps c1, number of traffic lights c2, and landmarks c3. A segment’s length lis set to approximately 500 m, and then on each segment there are specific traffic elements: sk=c1c2c3,l, as shown in Table 1.

skc1c2c3l(m)GPS start coordinateGPS end coordinate
s020None50119.284512,
−99.500927
19.285725,
−99.505498
s120School, museum, gas station, government offices50019.285725,
−99.505498
19.286330,
−99.510221
s201Banks, center square, school, fast-food restaurants50019.286330,
−99.510221
19.286711,
−99.514964
s332Cultural center, hospital, school offices, kindergarten50119.286711,
−99.514964
19.286477,
−99.519630
s430Telecom company offices, shopping mail49919.286477,
−99.519630
19.285784,
−99.514944
s521Hospital, government offices, cultural forum50019.285784,
−99.514944
19.284943,
−99.510282
s641School, supermarket, hospital50019.284943,
−99.510282
19.284500,
−99.505561
s720None48119.284500,
−99.505561
19.284403,
−99.500993

Table 1.

Segments’ characteristics.

2.2.2 Clustering

The silhouette score, considering three clusters, is better evaluated, with ss=0.7360. For four, five, and six clusters, we calculated a ss=0.7331, ss=0.7194, and ss=0.7105, respectively. As we were interested in communicating in a simple way the speed category at which is possible to travel, three options (as slow, medium, and normal) seem adequate. A similar approach in Google Maps (traffic option), where the speed is represented considering four options, from fast to slow.

The resultant average speed (in km/h) range of each cluster (or category) is category 1 (5.4112–18.1455), category 2 (18.1455–23.4234), and category 3 (23.4234–36.0750). For w2and w3, values smaller than 5.4112 fall into category 1, and those larger than 36.0750 fall into category 3.

The percentage of a segment’s speed data (from w1) in a cluster is shown in Table 2. It is interesting to note that for all segments, there is a specific cluster that contains a high percentage of data (at least 79.61%), which validates our clustering results.

SegmentCluster 1 (%)Cluster 2 (%)Cluster 3 (%)
s00.559.989.53
s111.8480.717.43
s288.9811.010
s393.666.330
s414.0485.950
s579.6120.380
s683.4716.520
s74.133.8592.01

Table 2.

Percentage of speed data in a cluster.

2.2.3 Comparative analysis of histograms

First, we consider all segments as a single road, and then the histograms of the speed frequency (from 6 a.m. to 11:59 p.m.) happening on weekdays (in w1) are compared in pairs, with the Chi-Square metric presented in Eq. (2). The results are shown in Table 3, starting with the lowest χ2value, i.e., the similar histograms among weekdays, with D1 = Monday, D2 = Tuesday, and so on.

D2–D3D4–D5D3–D4D2–D4D1–D3D1–D2D1–D5D3–D5D2–D5D1–D4
7.7710.75810.93611.13914.09715.16816.34716.82717.60920.653

Table 3.

Chi-Square distance between histograms with weekdays’ data.

Second, the speed data throughout weekdays, but individual segments, was used to conform the histograms of the speed frequency happening on each segment for 5 days (the weekdays of w1). These histograms were compared in pairs with the χ2. Table 4 shows the results starting with the lowest χ2. We found that if the compared segments share similar traffic elements, the speed frequency also is similar, and therefore a low χ2is obtained.

s2s5s3s6s5s6s1s4s2s6s0s7s2s3
20.3734.1739.0145.5746.0573.7588.591
s3s5s4s5s4s6s1s5s2s4s3s4s1s6
98.59167.65185.37198.5212.47220.04226.01
s1s2s0s1s1s3s1s7s0s4s4s7s6s7
238.39269.15271.36303.87321.89337.16337.43
s2s7s5s7s0s5s0s6s0s2s3s7s0s3
339.58340.24342.78344.59346.80350.76356.21

Table 4.

Chi-Square distance between histograms with segments data.

Figure 1 shows the most dissimilar histograms, s0and s3. Table 1 shows that s0has two speed bumps and no traffic lights, while s3has three speed bumps and two traffic lights; because the traffic lights on s3, we will expect a lower speed in this segment, and this conclusion can be corroborated by looking at Figure 1.

Figure 1.

Dissimilar histograms (s0 and s3).

Figure 2 shows the most similar histograms, s2and s5. Segments s2and s5share the same number of traffic lights; however, there are two speed bumps in s5and 0 in s2, then a slight superior speed is expected in s2(see Figure 2).

Figure 2.

Similar histograms (s2 and s5).

Tables 3 and 4 show that comparing histograms with the speed frequency of individual days (and all segments) are evaluated with a lower χ2(the lower value is 7.77, the higher is 20.653) than the observed comparing histograms with the speed frequency of individual segments (and all days), where the lower value is 20.37 and the higher is 356.21. Then, it appears that the travel speed is weakly influenced by the day of the week, since the traffic control elements of the whole road, from day to day, are the same. However, it seems that the segment strongly influences the travel speed, since the traffic control elements, which characterize each segment, modify the speed at which is possible to travel.

To corroborate the abovementioned statement, we use the speed frequency of w2. Figure 3 shows the histograms of the speed frequency of each day (and all segments), where it can be observed the histograms’ similarity. Figure 4 shows the histograms of the speed frequency of each segment (and all days), where it can be observed the histograms’ dissimilarities.

Figure 3.

Seed frequency of days.

Figure 4.

Speed frequency of segments.

2.2.4 Outlier detection

To put an example, the speed data of s0and w1is presented in Figure 5. We calculate the MD of this data (Figure 5), and then the probability density of the MD is presented in Figure 6, which has mean = 1.2331 and standard deviation SD = 0.6894. From Figure 6, a point with value MD > (2*SD + mean) = 2.6119 corresponds to a red point in Figure 5 and is considered an atypical point. The inequality value, i.e., (2*SD + mean), was established through trial and error.

Figure 5.

Time vs. speed: Data of s0 and w1.

Figure 6.

Mahalanobis distance vs. probability density.

The speed data from w1and w2, for all segments, is filtered the same way as the example. The data used in the polynomial regression satisfy MD < = (2*SD + mean) and in the logistic regression MD < = (3*SD + mean).

2.2.5 Mathematical models

Polynomial: The data of each segment, with time as the independent variable and travel speed as the dependent variable, is modeled with a five-degree polynomial, enabling four-speed trend changes (the common requirement from the observations). The coefficients are calculated with the least-squares regression technique.

The following terminology is used to describe the model: the data size of all segments is N=k=0k=7Nk, with Nkreferring the data size of the k segment. The observed i speed is denoted by yi, while time is ti. The speed model of segment kand week q is denoted by Mkqi, with k=01234567and q=12. The model is presented in Eq. (4), where coefficients φ1φ6were calculated with speed data of the corresponding week (q) and segment (k):

Mkqi=φ1+φ2ti+φ3ti2+φ4ti3+φ5ti4+φ6ti5E4

Multinomial logistic: The number of speed bumps and traffic lights (see Table 1) are used to explain the speed. With multinomial logistic regression [33], we obtained the logistic model presented in Eq. (5), with ψ= ab:

Eψqi=ψ1+ψ2v1i+ψ3v2i+ψ4v3i+ψ5v4i+ψ6v5iE5

The coefficients are denoted by ψ1ψ6, andq=12refers again to the data from w1and w2, respectively. The explanatory variables are v1=day weight, v2= number of speed bumps, v3=number of traffic lights, v4=segment weight, and v5= time. The weight of a specific day is calculated as the day average speed (of the speed measured from 6 a.m. to 11:59 p.m.) divided by the sum of the speed average of each weekday. A segment’s weight is calculated as the segment’s average speed (during weekdays) divided by the sum of the speed average of each segment.

In Eq. (5), Eaqcalculates the relative risk of being in cluster 1 vs. cluster 3 (the reference), and Ebqcalculates the same but for cluster 2 vs. cluster 3. The conversion to probability is given in Eqs. (6)(8), where Rjqis the probability belonging to the j category, with j={1,2,3}:

R1qi=eEaqi/1+eEaqi+eEbqiE6
R2qi=eEbqi/1+eEaqi+eEbqiE7
R3qi=1R1qi+R2qiE8

2.2.6 Connecting Intel with users

With the developed procedure, knowledge is acquired about the speed at which is expected to travel on the segments under the study. The architecture design (and the implementation) to connect the Intel with the users is out of the scope in this work (planned as future work); nevertheless we present in this section the basic idea.

The algorithms developed (in Appendix A and Appendix B) were programmed in a regular computer; according the procedure presented, the data acquired (from the zone under study) is modeled, and the models are used in the algorithms to generate knowledge. The link between this knowledge and the users could be established through a cell phone app (via the Internet). When a driver is in the proximity of a street segment, the cell phone (with GPS) detects the current location and acquires information for the driver, as the number of bumps and traffic lights, and also the expected travel speed calculated with the proposed algorithms; this info is presented to the driver in a proper way to not distract him, and then the driver can decide the more convenient route. A more challenging design is to communicate the cell phone with the vehicle (assuming that an intelligent system is part of it and can control some functions) and, for example, when the vehicle is approaching a speed bump, it automatically decelerates (if the driver is not reacting adequately).

The program running in a computer, in charged to acquire and mine data for generating knowledge and to establish communication with the responsive elements, conforms the “cyber” part of the system. The elements reacting with intelligence to the Intel delivered, as the driver, the cell phone, and the vehicle, conform the “physical” part of the system. Finally, the cyber and physical parts combined conform a cyber-physical system.

3. Results

3.1 Polynomial regression model and Algorithm 1

The error between the modeled data, with Eq. (4), and the observed data, was calculated with the mean absolute error (MAE) (see Eq. (9)) [34]. Here, n=Nk, yiand ŷiare the observed and modeled data, respectively. Table 5 shows the MAE, and its standard deviation (SD), with the data of w1and w2, and the respective modeled equations:

Mk1Mk2
SegmentMAE (km/h)SD (km/h)MAE (km/h)SD (km/h)
s00.82690.68020.78950.6321
s11.01010.96301.19390.9916
s20.95230.76221.17541.0670
s30.61980.46280.67010.5917
s40.84350.68820.97650.7416
s50.89710.68260.92340.7408
s60.84380.72970.77370.6680
s70.93761.07620.78940.6653

Table 5.

MAE and SD.

MAE=1ni=1nyiŷiE9

An algorithm (Appendix A, Algorithm 1) is designed to predict the speed of w3using the modeled equations (Mk1and Mk2) and historical data, i.e., the data available from w3before the current time. The error between the observed (from w3) and predicted (with Algorithm 1) travel speeds is calculated with Eq. (9). The MAE, SD, and hits (percentage of data categorized correctly) for w3, using Algorithm 1, are shown in Table 6.

SegmentMAE (km/h)SD (km/h)Hits (%)
s00.77570.714292.4
s10.93531.006185.2
s20.86410.753789.5
s30.77490.765894.8
s41.00531.090385.2
s50.80510.794293.5
s60.69680.663996.5
s70.79940.939193.5

Table 6.

Algorithm 1 prediction results: MAE, SD, and hits.

Figure 7 shows, as example, the observed speed data (in black circles) of w3and segment s0, the modeled data with w1(model M01, in blue dots) and w2(model M02, in green dots), and the estimated speed with Algorithm 1 (in red plus signs).

Figure 7.

Time vs. speed: data of s0 and w3.

3.2 Multinomial logistic regression model and Algorithm 2

Algorithm 2 (see Appendix B) is used to predict the speed category of the observed data from w3. H1iand H2iare two data sets obtained from w1and w2, respectively. These sets save the associated category of the average speed in a time interval from t(i)-0.5 to t(i) + 0.5 (0.5 h = 30 min) and centered on t(i), of the day and segment under evaluation. H3iis the category speed of w3(which is only available for previous data, i.e., prior to (i), with i…N being the data index. The probability most likely to occur is Pqi=maxR1qiR2qiR3qi=Rxqiand the category is stored in Sqi=x, where subindex q = {1,2} refers to the week. A threshold value, selected through trial and error, is used to discard the result in Sqiif Pqi<threshold. Algorithm 2 predicts the speed category for w3, which is stored in S3i. Choosing threshold = 0.9 gives 90.09% of correct evaluations. This percentage is the summation of cases, where S3(i) was categorized correctly divided by the total data N.

Afterward, we attempted to predict the speed category of the observed speed in w3under the assumption that set H2iis composed only with the average speed of each segment, and not including H1. The optimum result was found if threshold = ∼0.85, with 85.62% of correct predictions. If the threshold value is reduced, the positive prediction decreases (because the model fails to predict accurately with that threshold value). Similarly, if the threshold is increased, it becomes more difficult to satisfy the condition Pqithreshold, and then the positive prediction also drops because now the set H2i(with the limitation mentioned before) contributes more. Table 7 shows the percentage of speed data categorized correctly with different threshold values.

ThresholdPrediction (%)
0.7581.69
0.8083.10
0.8585.62
0.9084.24
0.9583.97

Table 7.

Algorithm 2: threshold values and w3prediction results.

3.3 Discussion

A series of steps are employed in a numerical example that, in combination, constitute a new method for speed prediction. The first step, street segmentation, divides an avenue in such a way that distributes different traffic elements on different segments. These elements are number of speed bumps, traffic lights, and landmarks, which in turn leads to different speed behavior on each segment. The second step, clustering, selects intervals which better fit the travel speed observed, resulting in three categories. Depending on the segment, most of the speed data (approximately at least 80%) is within a specific cluster (category). For example, we infer that the speed behaviors in s2and s5are similar, since most of the speeds for both fall inside cluster 1. Moreover, speed behaviors of s0and s3are dissimilar, since most of the speeds belong to different clusters (3 and 1, respectively). In the third step, comparative analysis of histograms, we corroborate that for each segment, the speed behavior is related to the traffic elements involved. It was observed that the speed histograms of two segments get a low Chi-Square distance if the segments share approximately the same number of speed bumps, traffic lights, and landmarks, independent of the day of the week. A high Chi-Square distance implies the opposite situation, i.e., segments with different number of traffic elements. The fourth step, outlier detection, removes atypical speed behavior, e.g., a vehicle circulating slower or faster than the usual. In step five, mathematical models, the models explain the speed. From steps 2 and 3, it is already known that on each segment, speed behaves according to the traffic elements involved, and hence the speed data of each segment is modeled independently with a polynomial model, with time as the independent variable. The multinomial logistic model uses as independent variables the number of speed bumps, traffic lights, the time, and two weights. The weights are calculated based on the average of the measured travel speeds considering segments and days. Finally, in step 6, connecting Intel with users, the drivers are properly informed about the travel speed expected on the surrounding segments, helping them to continuously adjust their route.

4. Conclusions

The procedure presented in this chapter proposes street segmentation; on each segment, there are traffic elements that we infer may be related with the observed speed frequency. By comparing speed histograms, we found that the speed frequency of all segments is similar among weekdays, and then the speed frequency of a specific segment is similar regarding the day. Considering the speed frequency of all weekdays, and individual segments, the segments with different traffic elements (speed bumps, traffic lights, and landmarks) yield dissimilar traveling speeds. From this observation, two techniques were considered for modeling speed: (1) polynomial regression, where the data of each segment is modeled independently, using time as the independent term, and (2) logistic regression, with several independent variables—number of speed bumps and traffic lights, time, and two weights (from the observed speeds on street segments and weekdays). The models were implemented in algorithms, which use the modeled and historical data. With the polynomial model and Algorithm 1, it was possible to categorize correctly the travel speed in the range from 85.2 to 96.5%, depending on the segment. The multinomial logistic model and Algorithm 2 correctly predict the speed category in 90.09% of the evaluated cases. With these results, we conclude that the proposed procedure is suitable to prepare and model speed data and then to predict the speed category at a low computer processing cost. The procedure is useful to establish the relationship between traffic infrastructure and travel speed.

4.1 Future work

We contemplate as future work the development of the architecture to communicate the expected travel speed (obtained with the proposed procedure) with the users, as well as convert this knowledge in suggestions and decision-making.

In Algorithm 1, if ideep(line 3), the modeled speed of w1and w2contributes the same (each multiplied by 0.5). The case ideep+1(line 6) enables the estimation of y¯i1and y¯iwith known data from w3. Variables h1and h2(see lines from 9 to 14) store the average of the absolute difference between historical and modeled data, from w1and w2, respectively. h3(line 15) stores the absolute difference of the historical and estimated data, from w3and index i-1. The condition in line 16 verifies that the yi3to yi1speeds are nonempty, i.e., available. h1h3are normalized and converted to weights, named W1W3. Because h carries the error, a greater h results in a smaller W, and so forth. In line 18, the predicted speed is calculated using the weights, the modeled speed with w1and w2, and the estimation with previous data of w3. If the condition in line 16 is not true, then in line 21 the speed prediction is calculated with the modeled data and new weights, without the w3data.

Algorithm 1

Initial conditions: deep = 3;

  1. fork=0tok=7

  2. fori=1toi=Nk

  3. ifideep

  4. ŷi=Mk1i0.5+Mk2i0.5

  5. endif

  6. ifideep+1

  7. y¯i1=yi2+yi2yi3

  8. y¯i=yi1+yi1yi2

  9. h1=0;h2=0;

  10. forj=1toj=deep

  11. h1=h1+yijMk1ij

  12. h2=h2+yijMk2ij

  13. endfor

  14. h1=h1/deep; h2=h2/deep

  15. h3=yi1y¯i1

  16. ifyi1yi2yi3

  17. h1=h1h1+h2+h3; h2=h2h1+h2+h3; h3=h3h1+h2+h3; W1=1h11h1+1h2+1h3; W2=1h21h1+1h2+1h3; W3=1h31h1+1h2+1h3

  18. ŷi=Mk1iW1+Mk2iW2+y¯iW3

  19. else

  20. h1=h1h1+h2; h2=h2h1+h2; W1=1h1; W2=1h2;

  21. ŷi=Mk1iW1+Mk2iW2

  22. endif

  23. endif

  24. endfor

  25. endfor

From Algorithm 2, in lines 3 to 10 it is compared the modeled and historical speed category (from w1and w2), with the historical from w3, to determine which is the accurate. The number of hits of the model and the historical (for weeks 1 and 2) is stored in score with sub-index from 1 to 4, for the four cases. In line 12, if the probability P2iis greater or equal than the selected threshold and, if score2score1, then S2iis the predicted speed category. In line 14, if P1ithresholdand, if score1score2, then the predicted speed category is S1i. If previous conditionals (line 12 and 14) are not evaluated to true, in lines from 16 to 18, the historical with the greater score, H2or H1, is the selected to predict the speed category.

Algorithm 2

Initial conditions: score1 = 0; score2 = 0; score3 = 0; score4 = 0; threshold ∈ [0.75,0.95];

  1. fori=1toi=N

  2. ifi>1

  3. ifH3i1==S1i1

  4. score1++;endif

  5. ifH3i1==S2i1

  6. score2++;endif

  7. ifH3i1==H1i1

  8. score3++;endif

  9. ifH3i1==H2i1

  10. score4++;endif

  11. endif

  12. if(P2ithresholdscore2score1

  13. S3i=S2i;else

  14. if(P1ithresholdscore1score2

  15. S3i=S1i;else

  16. ifscore4score3

  17. S3i=H2i;else

  18. S3i=H1i;endif

  19. endif

  20. endif

  21. endfor

Download

chapter PDF

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

José Gerardo Carrillo-González, Jacobo Sandoval-Gutiérrez and Francisco Pérez-Martínez (July 26th 2019). Procedure to Prepare and Model Speed Data Considering the Traffic Infrastructure, as Part of a Cyber-Physical System [Online First], IntechOpen, DOI: 10.5772/intechopen.88280. Available from:

chapter statistics

62total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us