基于大数据的高校学生分析与服务平台的研究

英文篇名：Research on big data platform for college students analysis and service
作者：石敏 ; 卢丹海 ; 秦婷
英文作者：SHI Min;LU Dan-hai;QIN Ting;Xi'an University of Posts & Telecommunications;
关键词：Hadoop ; Spark ; 高校学生分析 ; 大数据
英文关键词：Hadoop;;Spark;;college student analysis;;big data
中文刊名：HDZJ
英文刊名：Information Technology
机构：西安邮电大学;
出版日期：2019-02-20
出版单位：信息技术
年：2019
基金：陕西省教育厅自然科学专项(15JK1671);; 西安邮电大学研究生教育教学改革研究项目(YJGJ201627)
语种：中文;
页：HDZJ201902002
页数：6
CN：02
ISSN：23-1557/TN
分类号：13-18

摘要

采用Hadoop与Spark结合方式构建了一套分布式的学生综合分析与信息化服务大数据平台。平台对高校数字化建设中多源异构实时与非实时数据进行了采集、分布式存储、并行化计算分析以及结果展示,能提供学生行为画像、行为轨迹、课程预警、智能推荐等服务。系统集分析与服务为一体、安全性高、并行化能力强,提升了高校学生管理的智能化、决策的科学化和规范化。
An distributed big data platform associated with comprehensive analysis and information service is proposed in this paper. The big data platform based on Hadoop and spark technology which is capable of sampling,storing,analyzing and displaying the results from various massive and heterogeneous data. In addition,the platform can provide users with such functions as student behavior portrait,behavior track,course warning,intelligent recommendation and so on. The system combines the data analysis and information service as an integration with high security and strong parallelization ability,further promoting the intellectualization,scientific decision-making,standardization of college student management,etc.

引文

[1]Wang Y,Zhou J,Ma C,et al. Clover:A Distributed File System of Expandable Metadata Service Derived from HDFS[C]. International Conference on CLUSTER Computing.[S. l]:IEEE Press,2012:126-134.
    [2]冯兴杰,王文超. Hadoop与Spark应用场景研究[J].计算机应用研究,2018,35(9):2561-2566.
    [3]刘芬,王芳,田昊.基于Zookeeper的分布式锁服务及性能优化[J].计算机研究与发展,2014,51(S1):229-234.
    [4] Vavilapalli V K,Murthy A C,Douglas C,et al. Apache Hadoop YARN:yet another resource negotiator[C]. Symposium on Cloud Computing.[S. l]:ACM,2013:5-7.
    [5]Ting K,Cecho J J. Apache Sqoop Cookbook[J]. Oreilly Media,2013.
    [6]郝璇.基于Apache Flume的分布式日志收集系统设计与实现[J].软件导刊,2014(7):110-111.
    [7]Han Z,Zhang Y. Spark:A Big Data Processing Platform Based on Memory Computing[C]. Proc of International Symposium on Parallel Architectures.[S. l]:IEEE Press,2015:172-176.
    [8]Zaharia M,Chowdhury M,Das T,et al. Resilient distributed datasets:a fault tolerant abstraction for in-memory cluster computing[C]. Proc of Conference on Networked Systems Design and Implementation. USENIX Association,2012:2.
    [9]高彦杰. Spark大数据处理[M].北京:机械工业出版社,2014.
    [10]Fang C,Liu J,Lei Z. Fine-Grained HTTP Web Traffic Analysis Based on Large-Scale Mobile Datasets[J]. IEEE Access,2016,4(11):4364-4373.
    [11]Wang B,Yin J,Hua Q,et al. Parallelizing K-Means-Based Clustering on Spark[C]. Proc of International Conference on Advanced Cloud and Big Data. IEEE Press,2016:31-36.
    [12]Li X,Zhou W. Performance comparison of Hive,Impala and Spark SQL[C]. Proc of International Conference on Intelligent HumanMachine Systems and Cybernetics. IEEE Press,2015:418-423.
    [13]Li Zheng-xian,Hu Jin-long,Shen Jia-zhao,et al. A scalable recipe recommendation system for mobile application[C]. Proc of International Conference on Information Science and Control Engineering. IEEE Press,2016:91-94.
    [14]李涛,刘斌. Spark平台下的高效Web文本分类系统的研究[J].计算机应用与软件,2016,33(11):33-36.
    [15]周帅锋,赵智峰,曹俊亮,等.对大规模结构化和非结构化数据联合处理的系统及方法:湖北,CN103631909A[P]. 2014-03-12.
    [16]冯朝阁.基于YARN的工业大数据处理平台研究与实现[D].西安:西安电子科技大学,2015.
    [17]谢欢.大数据挖掘中的并行算法研究及应用[D].成都:电子科技大学,2015.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700