基于ListNet排序学习的特征处理方法

英文篇名：A Feature Processing Method Based on Ranking Algorithm ListNet
作者：李伟宁 ; 王磊
英文作者：LI Wei-ning;WANG Lei;School of Computer,Nanjing University of Posts and Telecommunications;School of Electronic Science and Engineering,Nanjing University of Posts and Telecommunications;
关键词：信息检索 ; 排序学习 ; 特征处理 ; ListNet
英文关键词：information retrieval;;learning to rank;;feature selection;;ListNet
中文刊名：WJFZ
英文刊名：Computer Technology and Development
机构：南京邮电大学计算机学院;南京邮电大学电子科学与工程学院;
出版日期：2018-04-27 17:35
出版单位：计算机技术与发展
年：2018
期：v.28;No.257
基金：国家“863”高技术发展计划项目(2006AA01Z201)
语种：中文;
页：WJFZ201809007
页数：5
CN：09
ISSN：61-1450/TP
分类号：36-39+43

摘要

排序学习(learning to rank)是一种机器学习与信息检索的交叉学科,可以从大量的包含标记的训练集中自动学习排序模型。特征选取对于排序模型的预测结果有很大的影响,而排序学习对其特征领域的研究却很少。针对这一问题,提出一种特征处理方法:利用基于主成分分析(PCA)的特征重组方法扩展数据集,然后在扩展后的数据集上进行排序算法隐含的特征选择。在LETOR4.0数据集(MQ2007,MQ2008)上基于排序评测函数对List Net排序算法进行验证。通过对比特征处理前后的排序性能差异,以及添加新特征的个数对排序结果的影响,实验结果表明,经过特征处理的利用排序学习算法构建的排序函数一般要优于原始的排序函数。
Learning to rank is an interdisciplinary of machine learning and information retrieval and learns ranking model automatically from given training data set. The feature space has a great influence on the performance of learning to rank approach,however,there are a little research in terms of feature generation. For this,we propose one feature analysis method which extends data set by feature recombination based on PCA,and then performs feature selection implied by learning to rank methods on the extended data set. We evaluate ranking algorithm ListNet on the LETOR4.0( MQ2007,MQ2008) data set based on ranking evaluation index,and experimentally compare the performance of ListNet using the data set with newfeature vectors and not,as well as the impact of the number of the newfeatures added to the result of sort. The experiment shows that ranking functions learned through learning to rank method based on the feature analysis methods outperform the original ones.

引文

[1]张俊林.这就是搜索引擎:核心技术详解[M].北京:电子工业出版社,2012:26-27.
    [2]李敏,卡米力·木依丁.特征选择方法与算法的研究[J].计算机技术与发展,2013,23(12):16-21.
    [3]黄震华,张佳雯,田春岐,等.基于排序学习的推荐算法研究综述[J].软件学报,2016,27(3):691-713.
    [4]印鉴,王智圣,李琪,等.基于大规模隐式反馈的个性化推荐[J].软件学报,2014,25(9):1953-1966.
    [5]程凡.基于排序学习的信息检索模型研究[D].合肥:中国科学技术大学,2012.
    [6]CRAMMER K,SINGER Y. Pranking with ranking[C]//Processing of the conference on neural information processing systems.Vancouver,British Columbia:[s.n.],2002:641-647.
    [7]SONG Yang,WANG Hongning,HE Xiaodong. Adapting deep RankNet for personalized search[C]//Proceedings of the 7th ACMinternational conference on web search and data mining.NewYork:ACM,2014:83-92.
    [8]CAO Yunbo,XU Jun,LIU Tieyan,et al. Adapting ranking SVMto document retrieval[C]//Proceedings of the 29th annual international ACMSIGIR conference on research and development in information retrieval. Seattle,Washington,USA:ACM,2006:186-193.
    [9]CAO Houwei,VERMA R,NENKOVA A.Speaker-sensitive emotion recognition via ranking:studies on acted and spontaneous speech[J]. Computer Speech&Language,2015,29(1):186-202.
    [10]CAO Zhe,QIN Tao,LIU Tieyan,et al.Learning to rank:from pairwise approach to listwise approach[C]//International conference on machine learning.Corvallis,OR:ACM,2007:129-136.
    [11]BURGES C J C.From ranknet to lambdarank to lambdamart:an overview[M].[s.l.]:[s.n.],2010.
    [12]DING Yuxin,ZHOU Di,XIAO Min,et al.Learning to rank relational objects based on the listwise approach[C]//International joint conference on neural networks. San Jose,CA,USA:IEEE,2010:1818-1824.
    [13]奚凌然,王小平.一种结合LPA半监督学习的排序学习算法[J].计算机应用与软件,2016,33(1):286-290.
    [14]AMINI MR,TRUONG T V,GOUTTE C.A boosting algorithm for learning bipartite ranking functions with partially labeled data[C]//International ACMSIGIR conference on research and development in information retrieval.[s. l.]:ACM,2008:99-106.
    [15]DUH K,KIRCHHOFF K.Learning to rank with partially-labeled data[C]//ACMspecial interest group on information retrieval.Singapore:ACM,2008:251-258.
    [16]LIN Yuan,LIN Hongfei,YANG Zhihao,et al.A boosting approach for learning to rank using SVD with partially labeled data[C]//5th Asia information retrieval symposium on information retrieval technology.Sapporo,Japan:[s.n.],2009.
    [17]边肇祺,张学工.模式识别[M].第2版.北京:清华大学出版社,2000.
    [18]程凡,李龙澍.基于Listwise的新型排序算法[J].计算机工程,2011,37(23):165-167.
    [19]QIN Tao,LIU Tieyan,XU Jun,et al.LETOR:a benchmark collection for research on learning to rank for information retrieval[J].Information Retrieval,2010,13(4):346-374.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700