教育资源推荐服务中若干关键技术的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

教育资源推荐服务中若干关键技术的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Some Key Technologies of Educational Resource Recommendation Service
作者：王龙
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：教育资源 ; 推荐服务 ; 高频词提取 ; 主动学习 ; 迁移学习 ; 分布式学习
英文关键词：educational resource ; recommendation service ; high-frequency word extraction ; active
英文关键词：learning ; active learning ; distributed learning
学位年度：2013
导师：刘衍珩
学科代码：081203
学位授予单位：吉林大学
论文提交日期：2013-05-01
答辩委员会主席：郑斯淸

摘要

本文对教育资源推荐服务中的推荐服务模型、资源特征表示以及基于机器学习的推荐算法等关键技术进行了讨论与研究，主要工作及贡献如下：
     在分析现有网络教学系统的特点和不足的基础上，提出一套完整的网络教学系统设计方案，将Web挖掘技术应用于网络教学系统中，在网络教学系统中引入教育资源推荐服务。
     针对教育资源内容的特征表示问题，使用文本类型资源的内容或者多媒体类型资源的资源描述作为资源内容的表示；提出一种基于树型结构和加权熵的中文高频词提取算法，使用该算法可以在没有词典的情况下从资源内容或者资源描述中提取高频词，将这些高频词作为资源的特征表示。
     针对如何减少推荐模型训练时间、提高推荐服务效率问题，使用流形学习技术对资源特征表示进行维数约减，缩短了推荐模型的训练时间；使用基于主动学习技术的推荐方法，减少了资源标注时间，提高推荐服务的质量和效率。
     针对跨领域教育资源推荐问题，提出一种结合数据时效性和权重约束的迁移学习算法。在经典迁移学习算法TrAdaBoost的权重分配中引入时效函数，从而体现样本数据的时效性。在算法执行过程中，对错分样本进行权重约束，从而提高算法的泛化能力。
     针对大规模教育资源推荐问题，提出一种基于有监督Hebb规则的分布式神经网络学习算法，并将该分布式学习算法应用于教育资源推荐服务中。该算法可以有效地解决大数据量样本集所带来的网络规模过大、训练时间过长等问题。
     本文的研究成果为教育资源推荐服务的研究提供了理论参考，在资源特征表示以及基于机器学习的推荐算法等方面具有一定的理论和应用价值。
With the development of network technology and educational informatization, educationalresource management system has been widely used in each stage of education. How toimprove the intelligent and the utilization efficiency of education resource system has beencommon concern.
     With the increase of educational resources in the network education system, finding theintrested resources has become more and more difficult. By adding the educational resourcerecommendation service in the network education system, the students can be free from thehuge information. It also make the center of network education system transform fromresources to students and develop to a higher level in network service. In this condition anddemand, the educational resource recommendation service technology has been developmentgradually.
     In this dissertation, we study on the key technologies of educational resourcerecommendation service, such as service model, resource features representation andrecommendation algorithm. The main work and contributions of this dissertation include:
     1. Based on the analysis of characteristics and disadvantages of the current networkeducation systems, a whole plan of network education system is designed. Theeducational resource recommendation service based on web mining technology is addedto the network education systems.The service model is given.The recommendationprocess can be divided into two stages: personalized information extraction andeducational resource recommendation.At personalized information extraction stage, apersonalized information extraction method base on web usage mining technology ispresented. At educational resource recommendation stage, a educational resourcerecommendation base on web content mning technology is presented.
     2. Focusing on the problem of resource features representation, use the content of textresources or the description of multimedia resources as the presentation of resourcecontent. A Chinese high-frequency word extraction algorithm based on tree structure andweighted entropy is presented. The algorithm can extract Chinese high-frequency wordsform the presentation of resource content without the support of the dictionary. Thesehigh-frequency words are use as resource features representation.
     3. Focusing on how to reduce training time, the manifold learning technology is used to reduce the dimensions of resource features representation. Focusing on how to improvethe efficiency of recommendation service, the active learning technology is used in therecommendation. The recommendation algorithm need a number of label informationwhich is accumulated in the using. The labelling time is reduced by the active learningtechnology, so the efficiency of recommendation service is improved.
     4. Focusing on the problem of cross-domain educational resource recommendation, atransfer learning algorithm combined with data timeliness and weight constraint ispresented. This algorithm can reflect the timeliness of data because a timeliness functionis added to the process of weight distribution. A operation of weight constraint is added,so this algorithm has more generalization capability.
     5. Focusing on the problem of large scale resource recommendation, a distributedrecommendation algorithm is presented. This algorithm adopts supervised Hebb learningrules. Simulation results show that this algorithm can solve the problems caused by largedataset, such as the large scale network and the long training time.
     The conclusions in this dissertation can provide the academic references for the research oneducational resource recommendation service and diversify the contents of research. It pushthe development of research on features representation and recommendation algorithm.

引文

[1]杨丽娜,刘科成,颜志军.面向虚拟学习社区的学习资源个性化推荐研究[J].电化教育研究,2010(04):67-71.
    [2] G. Adomavicius, A. Tuzhilin. Toward the Next Generation of Recommender Systems: ASurvey of the State-of-the-Art and Possible Extensions，IEEE Trans, on Knowl. and DataEng., vol.17，iss.6，pp.734-749，2005.
    [3] M. Balabanovi, Y. Shoham. Learning Information Retrieval Agents: Experiments withAutomated Web Browsing, in AAAI Spring Symposium on Information Gathering, pp.13-18,1995.
    [4] H. Lieberman. Letizia: An Agent That Assists Web Browsing, in Proceedings of theFourteenth International Joint Conference on Artificial Intelligence, pp.924-929，1995.
    [5] M. Pazzani, J. Muramatsu，D. Billsus. Syskill&Webert: Identifying interesting web sites,presented at the Proceedings of the thirteenth national conference on Artificialintelligence-Volume1，Portland，Oregon,1996.
    [6] Goldberg D, Nichols D, Oki B M, et al. Using Collaborative Filtering to Weave anInformation Tapestry. Communications of the ACM,1992,35(12):61~70.
    [7] J. L. Herlocker, J. A. Konstan，A. Borchers, J. Riedl. An algorithmic framework forperforming collaborative filtering，presented at the Proceedings of the22nd annualinternational ACM SIGIR conference on Research and development in informationretrieval, Berkeley, California, United States,1999.
    [8] K. Goldberg, T. Roeder， D. Gupta, C. Perkins. Eigentaste: A Constant TimeCollaborative Filtering Algorithm, Information Retrieval, vol.4, iss.2, pp.133-151,2001.
    [9] R. Burke. Hybrid Recommender Systems: Survey and Experiments, User Modeling andUser-Adapted Interaction, vol.12, iss.4，pp.331-370,2002.
    [10]ROBIN B. Hybrid recommender systems: survey and experiments [R]. Department ofInformation Systems and Decision Sciences, California State University, Fullerton.
    [11]T. Joachims. Web Watcher: A Tour Guide for the World Wide Web, Proceedings of theFifteenth International Joint Conference on Artificial Intelligence, iss. pp.770-775，1997.
    [12]M. Balabanovic, Y. Shoham. Fab: content-based， collaborative recommendation,Commun. ACM, vol.40, iss.3，pp.66-72，1997.
    [13]曾春,邢春晓,周立柱.个性化服务技术综述[J].软件学报,2002(10):1952-1961.
    [14]王实,高文,李锦涛.基于分类方法的Web站点实时个性化推荐[J].计算机学报,2002(8):845-852.
    [15]赵亮,胡乃静,张守志.个性化推荐算法设计[J].计算机研究与发展,2002(8):986-991.
    [16]曾海泉,宋扬,刘永丹,胡运发.一个基于Rough集的用户兴趣访问模式的发现算法[J].计算机研究与发展,2002(12):1598-1603.
    [17]邓爱林,朱扬勇,施伯乐.基于项目评分预测的协同过滤推荐算法[J].软件学报,2003(9):1621-1628.
    [18]鲍玉斌,王大玲,于戈.关联规则和聚类分析在个性化推荐中的应用[J].东北大学学报(自然科学版),2003(12):1149-1152.
    [19]王自强,冯博琴.个性化推荐系统中遗漏值处理方法的研究[J].西安交通大学学报,2004(8):808-810.
    [20]梁邦勇,李涓子,王克宏.基于语义Web的网页推荐模型[J].清华大学学报(自然科学版),2004(9):1272-1276.
    [21]崔林,宋瀚涛,陆玉昌.基于语义相似性的资源协同过滤技术研究[J].北京理工大学学报,2005(5):402-405.
    [22]杨艳,李建中,高宏.数字图书馆系统中基于Ontology的用户偏好模型[J].软件学报,2005(12):2080-2088.
    [23]黎铭,薛晓冰,周志华.基于多示例学习的中文Web目录页面推荐[J].软件学报,2004(9):1328-1335.
    [24]薛晓冰,韩洁凌,姜远,周志华.基于多示例学习技术的Web目录页面链接推荐[J].计算机研究与发展,2007(2):406-411.
    [25]周军锋,汤显,郭景峰.一种优化的协同过滤推荐算法[J].计算机研究与发展,2004(10):1842-1847.
    [26]高凤荣,邢春晓,杜小勇,王珊.基于矩阵聚类的协作过滤算法[J].华中科技大学学报(自然科学版),2005(12):257-260.
    [27]张锋,常会友.使用BP神经网络缓解协同过滤推荐算法的稀疏性问题[J].计算机研究与发展,2006(4):667-672.
    [28]张锋,常会友.基于分布式数据的隐私保持协同过滤推荐研究[J].计算机学报,2006(8):1487-1495.
    [29]张锋,孙雪冬,常会友,赵淦森.两方参与的隐私保护协同过滤推荐研究[J].电子学报,2009(1):84-89.
    [30]王志梅,杨帆.基于相似学习者发现的资源推荐系统[J].浙江大学学报(工学版),2006(10):1688-1791.
    [31]邢春晓,高凤荣,战思南,周立柱.适应用户兴趣变化的协同过滤推荐算法[J].计算机研究与发展,2007(2):296-301.
    [32]陈健,印鉴.基于影响集的协作过滤推荐算法[J].软件学报,2007(7):1685-1694.
    [33]张光卫,李德毅,李鹏,康建初,陈桂生.基于云模型的协同过滤推荐算法[J].软件学报,2007(10):2403-2411.
    [34]李聪,梁昌勇,马丽.基于领域最近邻的协同过滤推荐算法[J].计算机研究与发展,2008(9):1532-1538.
    [35]高滢,齐红,刘杰,刘大有.结合似然关系模型和用户等级的协同过滤推荐算法[J].计算机研究与发展,2008(9):1463-1469.
    [36]罗辛,欧阳元新,熊璋,袁满.通过相似度支持度优化基于K近邻的协同过滤算法[J].计算机学报,2010(8):1437-1445.
    [37]黄创光,印鉴,汪静,刘玉葆,王甲海.不确定近邻的协同过滤推荐算法[J].计算机学报,2010(8):1369-1377.
    [38]吴永辉,王晓龙,丁宇新,徐军,郭鸿志.基于主题的自适应、在线网络热点发现方法及新闻推荐系统[J].电子学报,2010(11):2620-2624.
    [39]石杰,申德荣,聂铁铮,寇月,于戈.一种基于多因素的引文推荐方法[J].计算机研究与发展,2011(12):180-187.
    [40]吴泓辰,王新军,成勇,彭朝晖.基于协同过滤与划分聚类的改进推荐算法[J].计算机研究与发展,2011(12):205-212.
    [41]王立才,孟祥武,张玉洁.上下文感知推荐系统[J].软件学报,2012(1):1-20.
    [42]Examining Social Networks and Collaborative Learning,http://www.chrome1.cn/computers/examining-social-networks-and-collaborative-learning/.
    [43]M Anderson, M Ball, H Boley, S Greene, N Howse, D Lemire, S Mcgrath. RACOFI: ARule-Applying Collaborative Filtering System,http://www.citeulike.org/user/koles/article/1803210.
    [44]Tiffany Ya TANG，Gordon MCCALLA．Smart Recommendation for an EvolvingE-Learning System：A simulation-based study．Lecture Notes in Artificial Intelligence(Subseries of Lecture Notes in Computer Science),2004(5):439-443.
    [45]Gulden Uchyigit, Matthew Y Ma. Personalization Techniques and Recommender Systems.World Scientific Publishing Company,2008(4).
    [46]Michael Gordon, Weiguo Fan, Sheizaf Rafaeli, Harris Wu, N Farag. The architecture ofcommKnowledge: combining link structure and user actions to support an onlinecommunity, Int. J. Electronic Business,Vol.1,No.1,2003.
    [47]Henri Avancini, Umberto Straccia. Personalization, Collaboration, and Recommendationin the Digital Environment CYCLADES (2004),http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.4910.
    [48]Tiffany Ya TANG，Gordon MCCALLA．Smart Recommendation for an EvolvingE-Learning System：A simulation-based study．Lecture Notes in Artificial Intelligence(Subseries of Lecture Notes in Computer Science)2004，(5)：439-443.
    [49]Towle，B. and Quinn，C.:2000，Knowledge-Based Recommendation System UsingExplicit User Models. In Knowledge-Based Electronic Markets，Papers from the AAAIWorkshop，AAAI Technical Report WS-00-04.PP.74-77.MenloPark，CA: AAAI Press
    [50]罗盛芬,孙茂松.基于字串内部结合紧密度的汉语自动抽词实验研究[J].中文信息学报,2003(3):9–14.
    [51]孙茂松,左正平,黄昌宁.汉语自动分词词典机制的实验研究[J].中文信息学报,2000(1):1-6.
    [52]吴胜远.一种汉语分词方法[J].计算机研究与发展,1996(4):306-311.
    [53]金翔宇,孙正兴,张福炎.一种中文文档的非受限无词典抽词方法[J].中文信息学报,2001(6):33-39.
    [54]刘挺,吴岩,王开铸.串频统计和词形匹配相结合的汉语自动分词系统[J].中文信息学报,1998(1):17-25.
    [55]韩客松,王永成,陈桂林.无词典高频字串快速提取和统计算法研究[J].中文信息学报,2001(2):23-30.
    [56]任禾,曾隽芳.一种基于信息熵的中文高频词抽取算法[J].中文信息学报,2006(5):40-47.
    [57]张宇萌,刘传汉.一种基于逐层扫描的频繁字串快速提取算法[J].计算机科学,2008(5):127-130.
    [58]姜翻华,党延忠.基于长度递减与串频统计的文本切分算法[J].情报学报,2006(1):74-79.
    [59]SHANNON C L. The mathematical theory of communication [J]. Bell System TechnicalJournal,1948(3):379–423.
    [60]GUIASU S. Information Theory with Applications [M]. New York: McGraw-Hill,International Book Company,1977.
    [61]韩家炜,孟小峰,王静,李盛恩. Web挖掘研究.计算机研究与发展,2001,38(4):405~414
    [62]王继成,潘金贵,张福炎. Web文本挖掘技术研究.计算机研究与发展,2000,37(5):513~520
    [63]陈文伟,黄金才.数据仓库与数据挖掘[M].北京:人民邮电出版社,2004.
    [64]邢东山,沈钧毅,宋擒豹.从Web日志中挖掘用户浏览偏爱路径[J].计算机学报,2003(11):1518-1523.
    [65]李静梅，孙丽华，张巧荣，张春生．一种文本处理中的朴素贝叶斯分类器．哈尔滨工程大学学报200324（l）：71－74
    [66]C. Bregler, S. M. Omohundro, Nonlinear manifold learning for visual speech recognition
    [C], Proceedings of the Fifth International Conference on Computer Vision,1995.
    [67]C. Bregler, S. M. Omohundro, Nonlinear Image Interpolation using Manifold Learning
    [C], Advances in Neural Information Processing Systems,1995.
    [68]李春光.流形学习及其在模式识别中的应用[D]：[博士学位论文].北京，北京邮电大学，2007.
    [69]V. de Silva, J. B. Tenenbaum, Global versus local methods in nonlinear dimensionalityreduction [C], Advances in Neural Information Processing Systems,2002.
    [70]J.B. Tenenbaum, V. de Silva, and J.C. Langford, A Global Geometric Framework forNonlinear Dimensionality Reduction [J], Science, vol.290,2319-2323, Dec.2000.
    [71]S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding[J].Science,290:2323–2326,2000.
    [72]M. Belkin, P. Niyogi, Laplacian Eignmaps for Dimensionality Reduction and DataRepresentation [J], Neural Computation, Vol.15, No.6,1373-1396,2003.
    [73]Xiaofei He and Partha Niyogi, Locality Preserving Projections [C], Advances in NeuralInformation Processing Systems, Vancouver, Canada,2003.
    [74]X. He, S. Yan, T. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacianfaces [J],IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.27, No.3,328-340,2005.
    [75]Zhenyue Zhang and Hongyuan Zha. Principal Manifolds and Nonlinear DimensionalityReduction via Tangent Space Alignment [J]. SIAM J. Scientific Computing,26:313–338,2004.
    [76]Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul, Learning a kernel matrix fornonlinear dimensionality reduction [C], Proceedings of the twenty-first internationalconference on Machine learning,2004.
    [77]Brun A., Westin C. F., Herberthson M., Fast manifold learning based on Riemanniannormal coordinations [C], Proc. of14th Scandinavian Conference on Image Analysis,2005.
    [78]Hinton G. E. and Roweis S. T.,Stochastic Neighbor Embedding [C], Advances in NeuralInformation Processing Systems,2002.
    [79]Brand M., Charting a manifold [C], Advances in Neural Information Processing Systems,961-968,2003.
    [80]Coifman R. et al., Geometric diffusions as a tool for harmonic analysis and structuredefinition of data: Diffusion maps [C], Proc. of the National Academy of Sciences, vol.102,7426-7431,2005.
    [81]Li Yang, Alignment of Overlapping Locally Scaled Patches for Multidimensional Scalingand Dimensionality Reduction [J], IEEE Transactions on Pattern Analysis and MachineIntelligence,30(3),438-450,2008.
    [82]Tong Lin, Hongbin Zha, Riemannian Manifold Learning [J], IEEE Transactions onPattern Analysis and Machine Intelligence,30(5),796-809,2008.
    [83]Zhou D et al., Ranking on data manifold [C], Advances in Neural Information ProcessingSystems,2003.
    [84]Jianzhong Wang, Jun Kong, Yinghua Lu, Miao Qi and Baoxue Zhang, A modified FCMalgorithm for MRI brain image segmentation using both local and nonlocal spatialconstraints [J]. Computerized Medical Imaging and Graphics, Vol32,685-698,2008.
    [85]张军平，流形学习若干问题研究[C].机器学习及其应用.北京，清华大学出版社，135-169，2006.
    [86]Ioannis Tziakos a, Andrea Cavallaro, Li-Qun Xu， Video event segmentation andvisualisation in non-linear subspace [J]，Pattern Recognition Letters，30，123–131,2009.
    [87]Errity A and Mckenna J, An investigation of manifold learning for speech analysis [C],Proc. of the International conference on Spoken Language Processing,2506-2509,2006.
    [88]许馨，吴福朝，胡占义，罗阿理，一种基于非线性降维求正常星系红移的新方法[J]，光谱学与光谱分析，26（1），182-186，2006.
    [89]龙军,殷建平,祝恩,赵文涛.主动学习研究综述[J].计算机研究与发展,2008,(12):300-304.
    [90]D.Cohn, L. Atlas, R. Ladner, Improving generalization with active learning, MachineLearning,15(2),1994, pp.201-221.
    [91]D.Lewis, W. Gale, A sequential algorithm for training text classifiers, In Proceedings ofthe ACM SIGIR Conference on Research and Development in Information Retrieval,1994, pp.3-12.
    [92]Liao X, Xue Y, Carin L. Logistic regression with an auxiliary data source[C]. Proceedingsof the21st International Conference on Machine Learning. New York: ACM,2005:505-512.
    [93]Huang J, Smola A, Gretton A, et al. Correcting sample selection bias by unlabeleddata[C]. Proceedings of the19th Annual Conference on Neural Information ProcessingSystems. Cambridge: MIT Press,2007:601-608.
    [94]Sugiyama M, Nakajima S, Kashima H, et al. Direct importance estimation with modelselection and its application to covariate shift adaptation[C]. Proceedings of the20thAnnual Conference on Neural Information Processing Systems. Cambridge: MIT Press,2008:1433-1440.
    [95]Bickel S, Sawade C, Scheffer T. Transfer learning by distribution matching for targetedadvertising[C]. Proceedings of the21th Annual Conference on Neural InformationProcessing Systems. Cambridge: MIT Press,2009:145-152.
    [96]Storkey A, Sugiyama M. Mixture regression for covariate shift[C]. Proceedings of the19th Annual Conference on Neural Information Processing Systems. Cambridge: MITPress,2007:1337-1344.
    [97]洪佳明,印鉴,黄云,刘玉葆,王甲海. TrSVM:一种基于领域相似性的迁移学习算法[J].计算机研究与发展,2011,48(10):1823-1830.
    [98]Dai W, Yang Q, Xue G, et al. Boosting for transfer learning[C]. Proceedings of the24thInternational Conference on Machine Learning. New York: ACM,2007:193-200.
    [99]Freund Y, Schipare RE. A decision-theoretic generalization of on-line learning and anapplication to boosting[J]. Journal of Computer and System Sciences,1997,55(1):119-139.
    [100]袁曾任.人工神经元网络及其应用[M].北京:清华大学出版社,1999.
    [101]田大新,刘衍珩,李宾,吴静.基于Hebb规则的分布神经网络学习算法[J].计算机学报,2007,30(8):1379-1388.
    [102]金瑜,陈光（礻禹）,刘红.模拟电路故障诊断的新方法[J].仪器仪表学报,2007,28(10):1870-1873.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700