摘要
使用Python爬虫爬取Kaggle官网近8年来共302个竞赛的赛事63 264个Kernel的相关信息,获取包括竞赛主体、竞赛任务、数据挖掘工具、算法使用情况以及应用领域的详细数据,利用词云、桑葚图等图表进行可视化分析。通过分析发现:(1)目前数据挖掘领域使用最多的编程语言为Python,使用最多的机器学习工具包为Keras;(2)数据挖掘的最热的研究方向主要包括机器视觉、自然语言处理等;(3)现有数据挖掘热点领域主要有医疗健康、公共管理、零售、电商、金融、文化娱乐、测绘遥感、保险、自动驾驶等;(4)热点数据挖掘算法主要包括随机森林、神经网络、增强算法等。
63 264 pieces of Kernel information of 302 competitions which hold on Kaggle during the past eight years were crawled by Python.Detailed data including the organizers,the tasks,the algorithms and the contest description were obtained,and then visualization analysis using wordle and sankey diagrams was carried out based on the obtained data.The research results show that:(1)Python is the most commonly used programming language and Keras is the most widely used machine learning toolkit in data mining.(2)The hottest research trends of data mining mainly include machine vision,natural language processing,etc.(3)Hot application fields of data mining mainly include medical and health care,public management,retail,e-commerce,finance,culture and entertainment,mapping and remote sensing,insurance,and automatic driving,etc.(4)Hot data mining algorithms mainly include random forest,neural network,and some enhancement algorithms,etc.
引文
[1]钱峰.基于SPSS知识地图的国内数据挖掘研究现状分析[J]. 情报科学,2008(6):924-928.
[2]张玉,郭会雨,陈建青.我国数据挖掘研究现状分析——基于共词分析视角[J]. 情报科学,2011(10):1589-1593.
[3]杨良斌.数据挖掘领域研究现状与趋势的可视化分析[J]. 图书情报工作,2015(s2):142-147.
[4]赵蓉英,余波.国际数据挖掘研究热点与前沿可视化分析[J]. 现代情报,2018(6):128-137.
[5]赵栋祥,张瑞.国际图情领域大数据研究热点挖掘与分析[J]. 图书馆学研究,2018(14):10-19.
[6]Singh A,Thakur N,Sharma A.A Review of Supervised Machine Learning Algorithms[C]. India:International Conference on Computing for Sustainable Global Development,2016.
[7]Lokanatha C R.A Review on Data Mining from Past to the Future[J]. International Journal of Computer Applications,2011,15(7):19-22.
[8]Schmidhuber,Jürgen.Deep Learning in Neural Networks:An Overview[J]. Neural Networks,2015,61:85-117.
[9]Lecun Y,Bengio Y,Hinton G.Deep learning[J]. Nature,2015,521(7553):436.
[10]Omar Y Al-Jarrah,Paul D Yoo,Sami Muhaidat,et al.Efficient Machine Learning for Big Data:A Review[J]. Big Data Research,2015,2(3):87-93.