A Natural Language Processing method for the identification of the factors influencing road accident severity



Published Jun 29, 2021
Dario Valcamonico Piero Baraldi Francesco Amigoni Enrico Zio


Although road safety has improved in the last decades, the rate of accidents with severe and fatal consequences is still exceeding the safety objectives (European Commission 2019; World Health Organization 2018).
This work explores the possibility of using Natural Language Processing (NLP) techniques for the automatic extraction of knowledge from road accidents reports, with the objective of supporting the safety management of the road infrastructure system (Persia et al. 2016).
To this aim, we consider databases of textual reports on road accidents, provided by the local public authorities. These reports contain the descriptions of the accidents and the results of the post-accident investigations. The aim is to analyze the reports by NLP to extract the features that most influence the accidents, for informing road safety management.
For the analysis of the reports, we develop a method that combines Hierarchical Dirichlet Processes (HDPs) (Teh et al. 2006), Artificial Neural Networks (ANNs) and a feature selection technique based on the Sequential Forward Selection (SFS) strategy (Marcano-Cedeño et al. 2010). HDPs allow representing each report as a mixture of topics, i.e. distributions of words co-occurring in the reports. In practice, each report is transformed into a vector whose elements are the degrees of membership to each topic, i.e. a measure of the contribution of each topic to the description of the report. ANNs are then used to classify the reports, represented by the extracted vectors, into classes characterizing the severity of the accident consequences. Finally, the SFS technique is used for identifying those topics which most influence the reports classification. In this way, the factors causing the accidents and influencing its evolution are automatically extracted. The developed method is validated considering a database of real accident reports.

European Commission. 2019. “EU Road Safety Policy Framework 2021-2030 - Next Steps towards ‘Vision Zero.’” Brussels,19.6.2019 SWD(2019) 283 final.
Marcano-Cedeño, A, J Quintanilla-Domínguez, M G Cortina-Januchs, and D Andina. 2010. “Feature Selection Using Sequential Forward Selection and Classification Applying Artificial Metaplasticity Neural Network.” In IECON 2010 - 36th Annual Conference on IEEE Industrial Electronics Society, Glendale, AZ, 2845–50. https://doi.org/10.1109/IECON.2010.5675075.
Persia, Luca, Davide Shingo, Flavia De Simone, Véronique Feypell, De La Beaumelle, George Yannis, Alexandra Laiou, et al. 2016. “Management of Road Infrastructure Safety.” Transportation Research Procedia 14: 3436–45. https://doi.org/10.1016/j.trpro.2016.05.303.
Teh, Yee Whye, Michael I Jordan, Matthew J Beal, and David M Blei. 2006. “Hierarchical Dirichlet Processes.” Journal of the American Statistical Association, 1566–81. https://doi.org/10.1198/016214506000000302.
World Health Organization. 2018. “Global Status Report on Road Safety.” Geneva: World Health Organization.

How to Cite

Valcamonico, D., Baraldi, P. ., Amigoni, F. ., & Zio, E. . (2021). A Natural Language Processing method for the identification of the factors influencing road accident severity. PHM Society European Conference, 6(1), 12. https://doi.org/10.36001/phme.2021.v6i1.2899
Abstract 503 | PDF Downloads 451



Road Safety, Natural Language Processing, Feature selection

Technical Papers