-
According to a World Health Organization (WHO) report published in 2018, an average of 1.35 million people per year were killed in road accidents[1], and it is the leading cause of deaths among the young generation. Among the casualties recorded in road accidents, many of them occurred at traffic signal junction[2, 3], especially at the onset of a yellow (amber) signal. Vehicles travelling on a road/lane with a speed limit of 60 km/h or higher enter a zone where the driver will be in a dilemma of whether to stop or cross the intersection. This zone is typically termed as problematic zone or dilemma zone, where vehicles can neither safely stop within the stop line nor cross the intersection at the onset of the yellow phase. To avoid a right-angle collision at this zone, drivers apply the harsh brake to stop just before the signalized intersection. However, this may lead to a back-end crash when the following driver encounters the former′s sudden stopping decision. This situation gets multifaceted when the traffic is heterogeneous, containing various types of vehicles. Due to the number of accidents recorded in this problem zone and growing concern, this issue has attracted many traffic safety researchers worldwide. In the last couple of decades, a substantial number of research studies identified strategies to minimize the vehicles getting trapped in this problem zone[3-6]. Initially, most researchers investigated the optimal yellow light duration to eliminate the problem[7, 8]. Later, researchers focused on identifying the factors influencing driving behaviour when approaching the junctions[9, 10], such as speed limit, police presence, distraction, cell phone use, work zone, road curviness, road conditions (wet/rainy), presence of pedestrians, bicycles etc.[11, 12] Besides these, driving behaviour also depends on age, gender, aggressiveness, perception-reaction time, etc.[13, 14] The review of influential factors affecting the driving behaviour when approaching signalized intersections is presented by Jahangiri et al.[15] Recent works on road traffic accident hotspot analysis provide a spatial analytic method to identify the road with a high probability of road accidents[16-18]. The significant factors and various approaches used to understand driving behaviour at the signalized intersection are listed in Table 1. Owing to many factors, different researchers looked at different aspects to understand the driver′s decision at the signalized intersection. Various approaches like statistical technique, empirical modelling, stochastic modelling, probability distribution, and machine learning methods are used[19, 36, 37].
Factor names Approaches* References Age & Gender SM, ST, PD, EM [19-24] Cell phone use EM [19, 25] Speed limit SM, EM, PD [22, 23, 26] Roadway grade SM, EM, PD [22-24] Driver aggressiveness ML [27] Vehicle type ST, EM [19, 27] Approach speed SM, ST, EM, ML [19, 25, 28-30] Perception-reaction time ST, EM, SM, PD, EM [11, 22-24] Acceleration/deceleration rate SM, ST, EM [24, 28-30] Time to an intersection (TTI) at the onset of yellow SM, ST, EM, ML [15, 24, 28, 30-32] Distance to an intersection (DTI) at the onset of yellow SM, ST, EM, ML [24, 25, 28-30] Pavement and weather conditions (wet, rainy) ML [24, 27] Presence of side-street vehicles, pedestrians, bicycles, or opposing vehicles waiting to turn left ST [33-35] * Different approaches: Statistical technique (ST), empirical modelling (EM), stochastic modelling (SM), probability distribution (PD), machine learning (ML) Table 1. List of influential factors along with different approaches used to ascertain driving behaviour when approaching signalized intersections
Most researchers implemented a machine learning approach to either predicting red-light running violations or classifying driving behaviour based on the driver aggressiveness, approach speed, road conditions or time/distance to an intersection. Various machine learning approaches (artificial neural network (ANN)[38], adaptive neuro-fuzzy inference system (ANFIS)[39], convolutional neural network (CNN)[40] and principal component analysis (PCA)[41]) were used to predict driving behaviour on different essential factors mentioned in Table 1. Owing to the support vector machine (SVM), it has found numerous applications[42-44]. With regards to this research area of driving behaviour, support vector regression was applied[45]. In comparison, the classification of the driver′s stopping behaviour at the signalized intersection using the SVM approach is not being reported in the open literature.
Therefore, this research study′s main objective is to develop a framework to identify the crucial distance for warning alerts based on vehicle speed. To meet this objective, the drivers′ driving behaviour at the signalized intersection is studied further to check whether the vehicle is stopped safely or not. Accordingly, the machine learning approach, like SVM, is used to classify for safe/unsafe stopping. The major outcomes of this research analysis can be further used to assess the rear-end crash risk at signalized intersections to seek effective engineering countermeasures and lower the crash rates at the high-risk locations
-
Typically, traffic violations are monitored by video cameras installed at the signalized intersections. Nevertheless, these video recordings don′t accurately monitor the driving condition, and they haven′t recorded the significant parameters (longitudinal velocity and acceleration) of the car. Generally, few drivers install video cameras in their vehicle, which are most useful for investigating collisions and drivers′ mistakes. However, these cameras do not record the car′s longitudinal velocity, acceleration or the amount of force applied to the brake. Since there is no reasonably live or recorded data publicly available, which has captured the car′s significant parameters and the drivers driving behaviour, these parameters are obtained using a driving simulator in a distraction-free environment[46]. The data provided for this study came from the Southampton University Driving Simulator (SUDS). This simulator vehicle is a Jaguar XJ saloon model car with complete operational driver controls, as shown in Fig. 1. It is an interactive fixed-base driving simulator, and it was used in the study to collect the driving data to assess driving behaviour. The road scenario is projected onto three screens with a 135-degree driver field-of-view. In the present study, the simulator was set up to run automatic transmission to avoid the potential differences in driving performance due to driving experience and gears′ engagement and clutch control in manual transmission. The interactive fixed car-based simulator was used in this research study to investigate the driving behaviour in different road conditions across rural roads. To characterize and capture driving behaviour for different driving scenarios, various vehicle process variables are monitored. In this study, seven important variables are studied, as shown in Table 2. These variables can provide valuable insights for investigating driving behaviour in different road conditions. Data collected from this driving simulator (under similar roads conditions) will be very useful. This driving data is free of distractions, uneven road conditions or cell phone use, and it provides the drivers′ real driving behaviour.
Figure 1. Driving simulator with a virtual environment[47]
Variables Units Time (elapsed) s Total distance travelled by driver
(longitudinal)m Longitudinal acceleration m/s2 Longitudinal velocity m/s Throttle input (0 for no throttle input,
1 for full-throttle input)− Brake pedal force lb Current traffic light status (0: No signal;
1: Green; 2: Yellow; 3: Red)− Table 2. List of parameters that are monitored on a simulator for investigating driving behaviour
The test-track simulated environment is designed to mimic the real-time scenarios, which spanned across sections of rural, suburban and urban roads with a total distance of 20 km. In the whole range, there are five junctions designed in such a way that three are located in rural, one in sub-urban, one in urban. These junctions are designed so that they can replicate a real-time problem zone, where drivers with their approach speed would get exposed to the onset of amber/red light when they were approaching the signalized junction. To monitor different drivers driving behaviour, 50 drivers voluntarily requested to drive the 20 km. During this journey, the vehicle crosses the five signalized intersections (3 rural intersections, 1 suburban intersection and 1 urban intersection). Driving performance data of individual driver was recorded for a time sampling of 0.1 s.
-
In this research study, features like brake pedal force and throttle input are excluded as they do not contribute to safe and unsafe stopping behaviour. As the vehicle comes to a complete halt, this can be confirmed when the velocity becomes zero. Also, classification is not based on time series modelling; therefore, the time series is not considered. Since the study mainly focuses on driving behaviour at a signalized intersection, the instances recorded after the onset of a yellow signal are used. In this study, the classification analysis is investigated for signalized intersection in the region, where the speed limit is 80 km/h. So, instead of using the distance recorded from the start of the journey, distance from intersection (DFI) is calculated. Henceforth, the processed data used in the further analysis includes the variables like longitudinal velocity (m/s), longitudinal acceleration (m/s2) and distance from the intersection (m).
-
In this research study, there are 50 drivers′ data recorded from a simulator. Each driving behaviour is recorded for a total travel distance of 5 km that lasted for approximately 20 min (average) with a time interval of 1 millisecond. There are five signalized junctions in the whole range, whereas, for this study, three intersections are considered. The other two junctions′ data have recorded the most erratic behaviour, hence excluded from the analysis. From the whole driving data, the instances at which the driver is in facing either yellow or red signal are extracted. As said above, all the 50 drivers′ data was extracted for the three intersections. From the 50 drivers′ data, 16 driving data were recorded to be very smooth and driven at very low speed and hence stopped safely before the intersection. However, the remaining 34 drivers′ data have shown fluctuating driving behaviour when approaching the intersection; hence these driving data were considered for further analysis.
-
Among the significant parameters recorded (see Table 2), the longitudinal velocity and acceleration provide vital information to identify each driving behaviour. Meanwhile, there is no ground truth available, which is labelled as safe/unsafe stopping. So, the labelling is done using the visual interpretation method. Among all the parameters, acceleration is the most crucial profile, which signifies the accelerating or decelerating periods; hence labels are generated based on the acceleration profiles. When the red signal′s onset, if the driver starts decelerating until they stop at the signal intersection, then the driving behaviour is labelled as safe stopping. If the driver begins decelerating and in between if they accelerate and stop just before the signal intersection, then this driving behaviour is labelled as unsafe stopping. For all the 34 drivers′ data, the deceleration and acceleration trends have been checked from 120 m up to the stop line. Accordingly, the driving behaviour has been labelled for all the drivers′ data.
-
To quantify the driver′s deceleration/accelerating behaviour, an important statistical metric like turning moment is calculated for longitudinal velocity and acceleration as (1) and (2), respectively. The calculated turning moment is the driver′s force within a maximum distance of 200 m (approximate signal visible distance), significantly changing the parameters like longitudinal velocity and acceleration, hence the driving behaviour.
$ {TM}_{v}=v\times (200-d) $ (1) $ {TM}_{a}=a\times (200-d) $ (2) where d, v and a are the driver′s distance from the intersection, driver′s longitudinal velocity and driver′s longitudinal acceleration, respectively. TMv (m2/s) and TMa (m2/s2) are the turning moments for longitudinal velocity and acceleration.
After calculating the turning moment (TMv) of the driver at each instance, the sum of turning moments of velocity (SUM_TMv) is calculated for all the instances. The turning moment (TMa) of acceleration at each instance is calculated for accelerating and decelerating periods. The average turning moments for deceleration (AVG_TMa1) are calculated for all the instances in the deceleration period. Similarly, the average turning moments for acceleration (AVG_TMa2) is calculated for all the instances in the acceleration period. The flow chart of the whole study is presented in Fig. 2.
-
Initially, the 34 drivers′ driving data are analyzed using Power BI software to visualize the driving behaviour trends. The driving behaviour labels are classified as safe and unsafe stopping based on the acceleration/deceleration profiles. As shown in Fig. 3 (a), driver-3′s driving behaviour is decelerating from 120 m (away from the intersection) until they stop at the intersection. Since the driver continuously decelerates until they reach the intersection and finally stops the vehicle, this type of driving behaviour is labelled as safe stopping. From driver-2 driving data (Fig. 3(b)), it can be seen that the driver started decelerating from 140 m until 90 m (recorded as the first part of deceleration). Then, the driver starts accelerating up to 50 m (recorded as the first part of acceleration). The driver then decelerates up to 35 m (recorded as the second part of deceleration) to accelerate again for a while. Finally, the driver decelerates by applying breaks as coming close to the intersection to stop the vehicle resulting in sudden stopping close to the intersection line; this is always unsafe to the following drivers. So, this type of behaviour is unsafe stopping. In general, if the driver initially decelerates, accelerates and again decelerates to stop the vehicle, this driving behaviour can be labelled as unsafe stopping.
Figure 3. Driver′s driving profiles for (a) driver-3 (safe stopping) and (b) driver-2 (unsafe stopping)
For all 34 drivers′ data, the deceleration and acceleration trends were verified for the distance 120 m until the stop line. Accordingly, all the driving behaviour has been labelled for further classification. As discussed in Section 2.2, the turning moment (TM) parameters are calculated for 34 drivers driving data to get new classification features. For calculating turning moments, different weights are used. The driving behaviour of the driver is divided into different zones whenever the driver starts decelerating and accelerating. As shown in Figs. 3(a) and 3(b), the first deceleration/acceleration is taken as zone 1, the next acceleration/decelerations taken as second zone 2, and the third deceleration/acceleration is taken as zone 3. The weights (w1=1.0, w2=1.8, w3=2.5) are used for computing TM values in the first, second and third zones. The calculated features, the sum of turning moments of velocity (SUM_TMv) and the average turning moments for deceleration (AVG_TMa1) are taken as input for SVM classification.
-
Machine learning is the right approach to predicting and classifying driving behaviour based on driving data. Since there are two classes in this present study (safe stopping and unsafe stopping), this is a binary classification problem. The support vector machine (SVM) is one of the machine learning approaches most suitable for classification problems[47, 48]. This method is used for classification and is extensively used for regression analysis[42, 49, 50]. SVM algorithm tries to find an optimal hyperplane that separates the data and clusters them based on the classes.
Owing to the dynamics of the process, the data is not usually often separable linearly. Therefore, SVM has a feature, which can map the data into a high dimensional feature space through nonlinear mapping. In this space, an optimal hyperplane that can separate the data is constructed. Since this process involves high computational cost, SVM has another powerful feature, which mimics the classification through kernel functions, depending only on input space variables. The significant kernel functions featured in SVM are linear, polynomial, sigmoid and radial base function.
The kernel functions return the inner product between two points in a suitable feature space.
Kernel (or window function) is as follows:
$ K\left( {\bar x} \right) = \left\{\begin{array}{*{20}{c}} 1,&{{\rm{if}}\;\left\| {\bar x} \right\| \le 1}\\ 0,&{{\rm{otherwise}}}. \end{array}\right. $ This function′s value is 1 inside the closed ball of radius 1 centered at the origin and 0 otherwise.
Polynomial kernel. It is popular in SVM classification and commonly used in image processing.
This kernel is mathematically expressed as
$ k\left( {{{{x}}_{{i}}},{{{x}}_{{j}}}} \right) = {\left( {{{{x}}_{{i}}} \times {{{x}}_{{j}}} + 1} \right)^d}. $ Gaussian kernel. It is a general-purpose kernel used when there is no prior knowledge about the data. This kernel is mathematically expressed as
$k\left( {x,y} \right) = {\rm{exp}}\left( { - \frac{{{{\left\| {x - y} \right\|}^2}}}{{2{\sigma ^2}}}} \right).$ Gaussian radial basis function (RBF). It is a general-purpose kernel used when there is no prior knowledge about the data. This kernel is mathematically expressed as
$k\left( {{x_i},{x_j}} \right) = {\rm{exp}}\left( { - \gamma {{\left\| {{x_i} - {x_j}} \right\|}^2}} \right)\;{\rm{for}}\;\gamma > 0.$ Sometimes, γ = 1/2σ2.
Before applying the SVM approach, the instances need to be labelled in a binary format. Since this study, the obtained drivers′ data is classified into two classes, i.e., safe and unsafe stopping; these classes are assigned binary labels as +1 and −1, respectively. Accordingly, all the 25 drivers′ data are labelled in the binary format, given input files for applying the SVM approach. The LibSVM[51, 52] is embedded into Matlab code to apply SVM to classify driving behaviour. The LibSVM parameters to train model are -s -t -c, where parameter c in the SVM optimization problem is a positive cost factor that penalizes misclassified training examples. A larger c discourages misclassification more than a smaller c. Here, we used for training the data c =1 and 100.
model = svmtrain (trainlabels, trainfeatures, ‘-s 0 -t 0 -c 1’);
The last string argument tells LiBSVM to train using the options:
-s 0: SVM classification
-t 0: linear kernel
-c 1: cost factor of 1.
-
The performance or efficiency of machine learning models depends on the input data used to train the model. The input data has to be randomly chosen to eliminate the bias towards a particular set of higher dimension values or group of data. In this regard, the whole sample data is usually divided into many subsets (also known as folds). Usually, this is divided into k subsets, and hence it is called k-fold cross-validation. These k subsets are equal-sized subsamples. Among these, k−1 subsamples are used in training the model and the remaining k subsamples are used as the validation data for testing the model. This splitting of whole data into k subset is entirely random; hence for every repetition, different sampled data will be grouped in the training data. This approach is commonly practiced in machine learning or data-driven modelling. A predictive model is developed using the training data set, which is further used to validate the testing data set. In this research study, five cross-validations are used for randomizing the data for training and testing the models.
-
The respective statistical metrics (accuracy, precision, F-measure and recall) are computed to validate the model′s predicted results. Accuracy, which is one of the statistically significant metrics, measures a measured value′s closeness to a known (standard) value. In other terms, it is the ratio of the correctly labelled classes to the whole pool of classes. This metric is very intuitive. Another metric precision measures the closeness of two or more measurements between each other. It is also defined as the ratio of the correctly positively labelled to all the positively labelled. This metric is also referred to as positive predictive value. Recall is a metric that measures the fraction of relevant information that is successfully retrieved. This metric provides insights into a query like out of all the instances where drivers have safely stopped, how many of those are correctly predicted. This metric is also referred to as sensitivity.
Finally, F-measure is an important metric that is evaluated from the harmonic mean of recall and precision.
All the above referred statistical performance metrics are mathematically evaluated as
$ {\eta }_{Accuracy}=\frac{\left(TP+TN\right)}{\left(TP+TN+FP+FN\right)} $ (3) $ {\eta }_{Precision}=\frac{\left(TP\right)}{\left(TP+FP\right)} $ (4) $ {\eta }_{Recall}=\frac{\left(TP\right)}{\left(TP+FN\right)} $ (5) $ {\eta }_{F-measure}=2\times\frac{\left(Precision\times Recall\right)}{\left(Precision+Recall\right)}. $ (6) TP, TN, FP, and FN are true positive, true negative, false positive, and false negative.
-
To classify the driver data based on the labelled safe or unsafe stopping, the linear SVM approach is implemented. The performance of the SVM depends on the amount of data that is used for training. As mentioned earlier, the drivers′ data is extracted for three intersections and analyzed. There are 102 instances for all three intersections recorded for 34 driving data. Among these instances, some drivers have driven slowly at a few intersections and do not see any major erratic behaviour. As these instances do not present significant information, they were not considered in further analysis. However, there are around 50 instances recorded where either the driver accelerates or decelerates in a short span, which may have a chance to meet with an accident. Therefore, only these 50 instances are used for classification analysis. To check the performance of the SVM approach, different separation of training data is used in Sections 3.2−3.4.
-
The data is separated into two data sets, in which 70% is used for training, and the remaining 30% data set is used for testing the model. In SVM classification, 70% of the data is used to train the model, and this trained model is implemented on the remaining 30% of the data to validate its performance in accurately classifying. The training dataset includes 35 instances of drivers′ data, and the test dataset includes 15 instances of drivers′ data.
Based on the 70% training dataset, the SVM model is trained, and the results are presented in Fig. 4. In Fig. 4, the data points representing safe stopping instances are marked in green circles, whereas the unsafe stoppings are marked in red circles. It can be observed that the SVM classifier manages to select the best classification boundary to separate safe and unsafe stopping drivers′ data from the chosen training dataset. The training accuracy in classifying the driving behaviour for all the 35 instances data was around 85.7%. The testing accuracy of the remaining 15 instances of data driving behaviour resulted in 86% accuracy. Among 15 instances of drivers′ data used for testing, only two instances of driver data representing unsafe stopping are incorrectly classified as safe stopping. However, the remaining all 13 instances of drivers′ data is correctly classified under unsafe stopping.
Figure 4. SVM classification for training data with settling cost = 1: (a) Training the model; (b) Testing the model.
In this analysis, the string argument (-s 0 -t 0 -c 1) in LiBSVM trains the model. The classification shown in Fig. 4 uses SVM linear classifier with setting cost, c = 1. It can be observed that the hyperplane is not completely classified, but the decision boundary seems a good fit. The cost value has been increased to 100 to check the effect of setting cost. The classification results are shown in Fig. 5. It can be observed that the classification accuracy has increased. So higher values of c (100) are preferred for better classification.
-
The data is randomized on 50 instances for training and testing; hence the classification process is performed using five cross-validations (CV), allowing the randomization of input data. 4/5 CV-fold is used for training and 1/5 CV-fold for testing. To further understand the effect of randomization on the classification accuracy, the simulation runs are repeated 30 times (iterations). The driving behaviour of 50 instances of the drivers′ data used for training results in 35 driving behaviour instances was safe stopping. The remaining 15 instances of driving behaviour are termed as unsafe stopping. Fig. 6 shows the accuracy V.S. the number of repeated 5 CV′s. Since for each iteration, different data is used for training the model, hence the accuracy is also different (which means the model is unbiased). This variation of accuracy for each iteration represents the significance of the model. The average accuracy of the 30 repetition runs for training shows 97.08 % accuracy, as shown in Fig. 6. Using the best model (highest accuracy model), it is validated on the remaining test data, which has resulted in 100% accuracy as shown in Fig. 7. These results demonstrate that the identified model is useful for determining the intersection′s accurate driving behaviour outcome.
Figure 7. SVM classification on (a) training and (b) testing data using model obtained at the 30th iteration
The classification performance is measured in terms of all the statistical metrics to validate their efficiency statistically. Table 3 shows the performance of each statistical metrics. In Table 3, the first two methods use the 70%−30% rule, where 35 instances of data are used for training the model, and the trained model is tested on the remaining 15 instances of data. The third method represents the method where experiment runs are conducted using 5 CV and repeated for 30 repetitions. It can be observed that for a settling cost c of 1, the accuracy for training and testing is 85.7% and 86%, respectively.
SVM methods Accuracy (%) Recall (%) F-measure (%) 70−30% linear (c=1) Training = 85.7%,
Testing = 86%Training = 77%,
Testing = 84%Training = 78%,
Testing = 84%70−30% linear (c=100) Training = 97%,
Testing = 93%Training = 96%,
Testing = 91%Training = 86%,
Testing = 85%5 CV linear Training = 97.1%,
Testing = 100%Training = 97%,
Testing = 100%Training = 83%,
Testing = 82%Table 3. Performance metrics for training data
In contrast, for c=100, the accuracy for training and testing is 97% and 93%, respectively. However, the average accuracy for 5 CV-30 repetitions for training and testing is 97.1% and 100%, respectively. High accuracy depicts how much the model can classify accurately. Interestingly recall is also high and shown similar results as accuracy. High recall implies that the model is extremely good. A recall value of 90% means that only 1 in every 10 stopping instances is misclassified by SVM and 1 is labelled as unsafe. F-measures of 90% means that 1 in every 10 unsafe stopping instances classified by SVM is safe, and 9 are unsafe.
-
To verify the performance of other machine learning techniques like K-nearest neighbors (KNN)[53] and linear discriminant techniques[54]. The accuracy for training and testing for different techniques is shown in Table 4. It can be observed that the SVM approach was able to classify the safe and unsafe stopping drivers′ data, thus providing higher classification accuracy. These results demonstrate that the identified model is useful for determining the intersection′s accurate driving behavior outcome.
Classification techniques Training Testing SVM 97.4% 93% KNN 94.7% 89% Linear discriminant 92.1% 90% Table 4. Comparisons of different classification techniques for training and testing
-
After analyzing the 50 driving instances, it was found that 15 drivers′ instances stopping behaviour at the signalized is considered to be unsafe stopping, and the remaining 35 drivers are doing safe stopping. The most significant changes are demonstrated in the velocity (SUM_TMv) parameter. The unsafe stopping driver′s data are analyzed further to determine the suitable distance from the intersection. A warning alert can be given to drivers based on their present velocity and longitudinal acceleration. In this regard, a tentative alarm sign can be given to drivers crossing a particular distance according to their driving speed at an intersection, as shown in Fig. 8. The green bar shows safe driving behaviour in Fig. 8, and the red bar shows unsafe driving behaviour. The blue mark shows a warning alert to the driver when the driving behaviour changes from safe to unsafe as the driver approaches the intersection at a particular speed. It can be observed that the warning alert (∆) for 15 instances of driving are appearing at a different distance from the intersection. For most instances, the warning alert is around 50 m. However, for driving behaviour at instances 1 and 14, the warning alert is given at a distance of 40 m from the intersection, which put the driver, fellow passengers and following vehicles in a dangerous situation. Hence, these results can easily differentiate the driving behaviour, heading to safe and unsafe stopping as they were approaching the intersection. This analysis provides the framework to identify the crucial distance, where the warning alerts should be kept in place if the vehicle speed is higher than the prescribed speed limit of a road.
-
This research study investigated driving behaviour by analyzing 50 instances of driving data at three signalized intersections. Support vector machine approach is implemented to classify the driving behaviour in terms of safe stopping and unsafe stopping at a signalized junction. Different types of scenarios were conducted to verify the performance of classifying driving behaviour. The SVM classification using random segregation (5 CV) with multiple repetitions resulted in an accuracy of 97.08%. The tentative warning alert distance to warn the drivers that they may have entered into a zone is identified to be around 80 m. In the absence of a warning system, the driver must apply harsh brake to stop before the signalized intersection to avoid a red light violation. However, this action has a high potential to lead to a back-end crash when the following driver encounters the former′s sudden stopping decision. Thus, this research′s significant outcomes can be potentially used to assess rear-end crash risk at signalized intersections to seek effective engineering countermeasures and reduce crash rates for high-risk locations.
-
This work was supported by Universiti Brunei Darussalam under the University Bursary Scholarship, Universiti Brunei Darussalam′s Research Grants (Nos, UBD/PNC2/2/RG/1(311) and UBD/RSCH/1.11/FICBF/2018/002).
Identification and Classification of Driving Behaviour at Signalized Intersections Using Support Vector Machine
- Received: 2021-01-21
- Accepted: 2020-01-01
- Published Online: 2021-04-07
-
Key words:
- Signalized intersection /
- driving behaviour /
- machine learning /
- support vector machine (SVM) /
- road accidents
Abstract: When the drivers approaching signalized intersections (onset of yellow signal), the drivers would enter into a zone, where they will be in uncertain mode assessing their capabilities to stop or cross the intersection. Therefore, any improper decision might lead to a right-angle or back-end crash. To avoid a right-angle collision, drivers apply the harsh brakes to stop just before the signalized intersection. But this may lead to a back-end crash when the following driver encounters the former′s sudden stopping decision. This situation gets multifaceted when the traffic is heterogeneous, containing various types of vehicles. In order to reduce this issue, this study′s primary objective is to identify the driving behaviour at signalized intersections based on the driving features (parameters). The secondary objective is to classify the outcome of driving behaviour (safe stopping and unsafe stopping) at the signalized intersection using a support vector machine (SVM) technique. Turning moments are used to identify the zones and label them accordingly for further classification. The classification of 50 instances is identified for training and testing using a 70%−30% rule resulted in an accuracy of 85% and 86%, respectively. Classification performance is further verified by random sampling using five cross-validation and 30 iterations, which gave an accuracy of 97% and 100% for training and testing. These results demonstrate that the proposed approach can help develop a pre-warning system to alert the drivers approaching signalized intersections, thus reducing back-end crash and accidents.
Citation: | Citation: S. L. Karri, L. C. D. Silva, D. T. C. Lai, S. Y. Yong. Identification and classification of driving behaviour at signalized intersections using support vector machine. International Journal of Automation and Computing . http://doi.org/10.1007/s11633-021-1295-y doi: 10.1007/s11633-021-1295-y |