数据挖掘应用热点研究——基于Kaggle竞赛数据
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Research on the Application Hotspots of Data Mining — Based on Kaggle Competition Data
  • 作者:邓仲华 ; 刘斌
  • 英文作者:Deng Zhonghua;Liu Bin;
  • 关键词:数据挖掘 ; 机器视觉 ; 自然语言处理 ; 医疗健康
  • 英文关键词:data mining;;machine vision;;natural language processing;;medical & health care
  • 中文刊名:TSSS
  • 英文刊名:Research on Library Science
  • 机构:武汉大学信息管理学院;
  • 出版日期:2019-03-25
  • 出版单位:图书馆学研究
  • 年:2019
  • 期:No.449
  • 基金:国家自然科学基金项目“大数据环境下面向科学研究第四范式的信息资源云研究”(项目编号:71373191)的研究成果之一
  • 语种:中文;
  • 页:TSSS201906001
  • 页数:9
  • CN:06
  • ISSN:22-1052/G2
  • 分类号:4-11+25
摘要
使用Python爬虫爬取Kaggle官网近8年来共302个竞赛的赛事63 264个Kernel的相关信息,获取包括竞赛主体、竞赛任务、数据挖掘工具、算法使用情况以及应用领域的详细数据,利用词云、桑葚图等图表进行可视化分析。通过分析发现:(1)目前数据挖掘领域使用最多的编程语言为Python,使用最多的机器学习工具包为Keras;(2)数据挖掘的最热的研究方向主要包括机器视觉、自然语言处理等;(3)现有数据挖掘热点领域主要有医疗健康、公共管理、零售、电商、金融、文化娱乐、测绘遥感、保险、自动驾驶等;(4)热点数据挖掘算法主要包括随机森林、神经网络、增强算法等。
        63 264 pieces of Kernel information of 302 competitions which hold on Kaggle during the past eight years were crawled by Python.Detailed data including the organizers,the tasks,the algorithms and the contest description were obtained,and then visualization analysis using wordle and sankey diagrams was carried out based on the obtained data.The research results show that:(1)Python is the most commonly used programming language and Keras is the most widely used machine learning toolkit in data mining.(2)The hottest research trends of data mining mainly include machine vision,natural language processing,etc.(3)Hot application fields of data mining mainly include medical and health care,public management,retail,e-commerce,finance,culture and entertainment,mapping and remote sensing,insurance,and automatic driving,etc.(4)Hot data mining algorithms mainly include random forest,neural network,and some enhancement algorithms,etc.
引文
[1]钱峰.基于SPSS知识地图的国内数据挖掘研究现状分析[J]. 情报科学,2008(6):924-928.
    [2]张玉,郭会雨,陈建青.我国数据挖掘研究现状分析——基于共词分析视角[J]. 情报科学,2011(10):1589-1593.
    [3]杨良斌.数据挖掘领域研究现状与趋势的可视化分析[J]. 图书情报工作,2015(s2):142-147.
    [4]赵蓉英,余波.国际数据挖掘研究热点与前沿可视化分析[J]. 现代情报,2018(6):128-137.
    [5]赵栋祥,张瑞.国际图情领域大数据研究热点挖掘与分析[J]. 图书馆学研究,2018(14):10-19.
    [6]Singh A,Thakur N,Sharma A.A Review of Supervised Machine Learning Algorithms[C]. India:International Conference on Computing for Sustainable Global Development,2016.
    [7]Lokanatha C R.A Review on Data Mining from Past to the Future[J]. International Journal of Computer Applications,2011,15(7):19-22.
    [8]Schmidhuber,Jürgen.Deep Learning in Neural Networks:An Overview[J]. Neural Networks,2015,61:85-117.
    [9]Lecun Y,Bengio Y,Hinton G.Deep learning[J]. Nature,2015,521(7553):436.
    [10]Omar Y Al-Jarrah,Paul D Yoo,Sami Muhaidat,et al.Efficient Machine Learning for Big Data:A Review[J]. Big Data Research,2015,2(3):87-93.