Machine-learning classifiers for imbalanced tornado data
详细信息    查看全文
  • 作者:Theodore B. Trafalis ; Indra Adrianto…
  • 关键词:Machine learning ; Support vector machines ; Random forest ; Rotation forest ; Logistic regression ; Tornado detection ; 62H30 ; 68Q32 ; 62J86
  • 刊名:Computational Management Science
  • 出版年:2014
  • 出版时间:October 2014
  • 年:2014
  • 卷:11
  • 期:4
  • 页码:403-418
  • 全文大小:729 KB
  • 参考文献:1. Bi J, Bennett KP, Embrechts M, Breneman CM, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229-243
    2. Bluestein HB (1993) Synoptic-dynamic meteorology in midlatitudes: volume II: observations and theory of weather systems. Oxford University Press, New York
    3. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on computational learning theory, Pittsburgh, Pennsylvania, US
    4. Breiman L (2001) Random Forests. Mach Learn 45(1):5-2. doi:10.1023/a:1010933404324 CrossRef
    5. Cárdenas AA, Baras JS (2006) B-ROC curves for the assessment of classifiers over imbalanced data sets. In: Proceedings of the 21st national conference on artificial intelligence (AAAI 06), Boston, Massachusetts, July 16-0, 2006
    6. Donaldson RJ, Dyer RM, Krauss MJ (1975) An objective evaluator of techniques for predicting severe weather events. In: Ninth conference on severe local storms, Norman, OK, 1975. American Meteorological Society, pp 321-26
    7. Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced data sets II, ICML, Washington, DC, 2003
    8. Efron B, Tibshirani R (1993) An introduction to the bootstrap. In: Monographs on statistics and applied probability, vol 57. Chapman & Hall, New York
    9. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-):389-22. doi:10.1023/a:1012487302797 CrossRef
    10. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10-8. doi:10.1145/1656274.1656278 CrossRef
    11. Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. In: Adaptive computation and machine learning. MIT Press, Cambridge
    12. Heidke P (1926) Berechnung des erfolges und der gute der windstarkvorhersagen im sturmwarnungsdienst. Geografiska Annaler 8:301-49 CrossRef
    13. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of 14th international conference on machine learning, 1997. Morgan Kaufmann, Los Altos, pp 179-86. citeulike-article-id:2526066
    14. Lakshmanan V, Stumpf G, Witt A (2005) A neural network for detecting and diagnosing tornadic circulations using the mesocyclone detection and near storm environment algorithms. In: 21st international conference on information processing systems, San Diego, CA, 2005. p J5.2
    15. Marzban C, Stumpf GJ (1996) A neural network for tornado prediction based on Doppler radar-derived attributes. J Appl Meteorol 35(5):617-26 CrossRef
    16. McGill R, Tukey JW, Larsen WA (1978) Variations of box plots. Am Stat 32(1):12-6. doi:10.2307/2683468
    17. Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola A, PB, Sch?lkopf B, Schuurmans D (ed) Advances in large margin classifiers. pp 61-4. citeulike-article-id:3115812
    18. Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203-31 CrossRef
    19. Provost FJ, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. Paper presented at the proceedings of the fifteenth international conference on machine learning
    20. Richman MB (1986) Rotation of principal components. J Climatol 6(3):293-35 CrossRef
    21. Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619-630. doi:10.1109/TPAMI.2006.211 CrossRef
    22. Roebber PJ (2009) Visualizing multiple measures of forecast quality. Weather Forecast 24:601-08 CrossRef
    23. Stumpf GJ, Witt A, Mitchell ED, Spencer PL, Johnson JT, Eilts MD, Thomas KW, Burgess DW (1998) The national severe storms laboratory mesocyclone detection algorithm for the WSR-88D. Weather Forecast 13(2):304-26 CrossRef
    24. Trafalis TB, Ince H, Richman MB (2003) Tornado detection with support vector machines. Paper presented at the proceedings of the (2003) international conference on computational science. Melbourne, Australia
    25. Trafalis TB, Santosa B, Richman MB (2004) Bayesian neural networks for tornado detection. WSEAS Trans Syst 3:3211-216
    26. Trafalis TB, Santosa B, Richman MB (2005) Learning networks for tornado forecasting: a Bayesian perspective. WIT Trans Inf Commun Technol 35:5-4
    27. Vapnik VN (1998) Statistical learning theory. In: Adaptive and learning systems for signal processing, communications, and control. Wiley, New York
    28. Wilks D (1995) Statistical methods in atmospheric sciences. Academic Press, San Diego
    29. Yang JH, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst App 13(2):44-9. doi:10.1109/5254.671091 CrossRef
  • 作者单位:Theodore B. Trafalis (1)
    Indra Adrianto (1)
    Michael B. Richman (2)
    S. Lakshmivarahan (3)

    1. School of Industrial and Systems Engineering, The University of Oklahoma, Norman, OK, 73019, USA
    2. School of Meteorology, The University of Oklahoma, Norman, OK, 73019, USA
    3. School of Computer Science, The University of Oklahoma, Norman, OK, 73019, USA
  • ISSN:1619-6988
文摘
Learning from imbalanced data, where the number of observations in one class is significantly larger than the ones in the other class, has gained considerable attention in the machine learning community. Assuming the difficulty in predicting each class is similar, most standard classifiers will tend to predict the majority class well. This study applies tornado data that are highly imbalanced, as they are rare events. The severe weather data used herein have thunderstorm circulations (mesocyclones) that produce tornadoes in approximately 6.7?% of the total number of observations. However, since tornadoes are high impact weather events, it is important to predict the minority class with high accuracy. In this study, we apply support vector machines (SVMs) and logistic regression with and without a midpoint threshold adjustment on the probabilistic outputs, random forest, and rotation forest for tornado prediction. Feature selection with SVM-recursive feature elimination was also performed to identify the most important features or variables for predicting tornadoes. The results showed that the threshold adjustment on SVMs provided better performance compared to other classifiers.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700