摘要
恐怖袭击在全球范围内频发,针对恐怖袭击的预警及防控研究十分必要。利用2006-2016年脆弱国家指数及全球恐怖主义数据库(GTD),基于多种机器学习模型,对全球各国家遭受恐怖袭击的风险进行回归预测。结果表明,随机森林、K近邻及决策树模型表现最优,其拟合优度的确定系数R~2达到了0.75、0.74和0.67。随机森林预测结果总体符合实际情况,尤其在恐怖袭击高发的中东和中亚地区预测较为准确。根据特征重要性排序结果,安全机构、公共服务、人权法治和集团之间的矛盾对预测结果的刻画能力最强。
Terrorist attacks occur frequently all over the world. Study on early warning, prevention and control of terrorist attacks is necessary. Methods of prediction of global terrorist attacks were studied using the data from Fragile States Index and Global Terrorism Database from 2006 to 2016, based on six kinds of Machine Learning Models. The results show that Random Forest, K-neighbors and Decision tree perform well, which has the highest R-squared as 0.75, 0.74 and 0.67. The prediction results of Random Forests are generally in line with the actual situation, especially in the Middle East and Central Asia, where terrorist attacks occur frequently. According to the results of importance ranking of characteristics, Security Apparatus, Public Services, Human Rights and Rule of Law and Group Grievance have the strongest ability to portray prediction results.
引文
[1] Petroff V B, Bond J H, Bond D H, et al. Using Hidden Markov Models to Predict Terror Before it Hits (Again)[M]: Springer New York, 2013: 163-180.
[2] 战兵, 韩锐. 基于隐马尔可夫的恐怖事件预测模型[J]. 解放军理工大学学报(自然科学版), 2015(4): 386-393.
[3] 傅子洋, 徐荣贞, 刘文强. 基于贝叶斯网络的恐怖袭击预警模型研究[J]. 灾害学, 2016,31(3): 184-189.
[4] 薛安荣, 毛文渊, 王孟頔, 等. 基于贝叶斯方法和变化表的恐怖行为预测算法[J]. 计算机科学, 2016(12): 130-134.
[5] 项寅. 基于改进神经网络的恐怖袭击风险预警系统[J]. 灾害学, 2018,33(1): 183-189.
[6] 胡成, 李明星, 古丽燕, 等. 情报视角下暴力恐怖活动多元社会网络测度研究[J]. 情报杂志, 2018(3): 33-39.
[7] 李益斌. 印度恐怖主义与社会经济因素的关系探究[J]. 南亚研究, 2018(2): 139-154.
[8] 李益斌. 欧洲恐怖主义的新态势及原因分析——基于聚类分析法[J]. 情报杂志, 2018(3): 55-63.
[9] 李友龙. 恐怖活动的象征性标识——以巴黎恐袭案为例[J]. 情报杂志, 2016(8): 25-30.
[10] Kharas H , Salehi-Isfahani D , Hove C. The failed states index[J]. Foreign Policy, 2009(173):80-93.
[11] 朱剑, 郝巧英. “失败国家”与文明使命:国家脆弱程度指数再评估[J]. 探索, 2017(5): 157-164.
[12] 位珍珍. 后911时代恐怖主义的GTD数据分析[J]. 情报杂志, 2017(7): 10-15.
[13] 叶琼元, 兰月新, 夏一雪, 等. 反恐数据库构建的国际比较及对我国的启示[J]. 情报杂志, 2018(5): 43-51.
[14] 范淼. 机器学习及实践——从零开始通往Kaggle竞赛之路[M]. 北京: 清华大学出版社, 2016: 183.
[15] Jameslambrinos. Applied linear regression models[J]. Technometrics, 2004, 26(4): 415-416.
[16] Ketkar, Nikhil. Stochastic Gradient Descent[M]. Deep Learning with Python. Apress, 2017.
[17] Awad M, Khanna R. Support vector regression[J]. Neural Information Processing Letters & Reviews, 2007, 11(10): 203-224.
[18] Buza K, Nanopoulos A, Nagy G. Nearest neighbor regression in the presence of bad hubs[J]. Knowledge-Based Systems, 2015, 86(C): 250-260.
[19] Friedl M A, Brodley C E. Decision tree classification of land cover from remotely sensed data[J]. Remote Sensing of Environment, 1997, 61(3): 399-409.
[20] Pal M. Random forest classifier for remote sensing classification[J]. International Journal of Remote Sensing, 2005, 26(1):217-222.
[21] ZHANG N, HUANG H. Resilience analysis of countries under disasters based on multisource data[J]. Risk Analysis, 2018, 38(1): 31-42.
[22] 武增海, 李涛. 高新技术开发区综合绩效空间分布研究——基于自然断点法的分析[J]. 统计与信息论坛, 2013, 28(3): 82-88.
[23] Louppe G, Wehenkel L, Sutera A, et al. Understanding variable importances in Forests of randomized trees[J]. Advances in Neural Information Processing Systems, 2013, 26: 431-439.
[24] HAN H, GUO X, YU H. Variable selection using Mean Decrease Accuracy and Mean Decrease Gini based on Random Forest[C], IEEE International Conference on Software Engineering and Service Science,2017.