Crash hotspots ranking is an important step for highway safety management decision support. Ranking crash hotspots incorrectly might result in allocating limited resources inefficiently.
While traditional methods include the use of non-parametric approaches in ranking different groups, this paper proposes a new hybrid clustering method combining Genetic Algorithm and Ordered Logit Model (HGA-OLM).
Unlike non-parametric data clustering techniques, such as K-Means, K-Nearest Neighbors (KNN), and Support Vector Machine, HGA-OLM finds the best boundary value for each data cluster by maximizing the log-likelihood value of grouped dataset. Ordered Logit Model is a logistic regression model that has ranked response variables.
For the Ordered Logit Model, a log-likelihood value represents how the ranks fit into the model. Genetic Algorithm is used to maximize the Log-likelihood value of Ordered Logit Model and provide the optimal boundaries for each data group.
A case study to evaluate the effectiveness of HGA-OLM is performed to rank highway segments by using crash costs to determine the operational strategy of Safety Service Patrol (SSP). Crash records collected from 2014 to 2016 on the entire stretch of Interstate-78 in New Jersey is used to calculate the crash cost for each 0.1-mile segment of I-78.
The case study examines the statistical significances of clustered dataset by measuring the log-likelihood values resulted from the K-Means method and HGA-OLM for multiple grouping scenarios. The case study results consistently show that the statistical significance of dataset grouped by HGA-OLM appear higher than the K-Means method.