基于多类别特征的在线广告点击率预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在线广告点击率对搜索引擎服务提供商和广告商都是一个重要的量化指标。因此,在线广告点击率预测,是计算广告领域的关键问题之一。工业界与学术界都对点击率预测问题有持续的研究,各个搜索引擎服务提供商都建立了自身的点击率预测体系,可见本课题具有很强的理论研究价值和实际应用价值。
     本文围绕点击率预测问题,进行了完整的研究方法建模。首先对搜索引擎在线广告进行特性研究,总结出五大特性;在此基础上,定义广告显式特征和隐式特征,进行相应特征提取;并将概率关系模型引入特征选择阶段,将特征分为与真实结果直接相关、间接相关、完全无关三类;然后引入因子分解机模型作为预测模型对广告点击率进行预测,输入端为进行特征选择后的实值特征向量;最后采用曲线下面积(AUC)对预测结果进行评价。
     值得重点提出的是,在当前研究中,对于特征提取主要强调位置以及广告属性特征,缺乏考虑广告被触发的场景以及广告与用户检索词的联系;已有的基于广告类别特征提取预测广告点击率采用同一类别的广告点击率平均值直接对待预测广告进行预测,无法将广告类别与其他特征混合加强进行预测;对广告直接进行聚类也可以得到广告类别,但是这样只能得到广告的唯一类别标注(本文定义其为广告类别特征)。在线广告本身具有多主题性,单一类别标注在不同的用户检索行为下失去意义。由此,本文提出一种基于广告多类别特征的点击率预测方法:定义用户检索行为对广告的触发作用,通过间接聚类提取广告的多类别特征,将多类别特征输入预测模型——因子分解机中对点击率进行预测。实验结果表明,广告多类别特征明显提高预测准确率;并且多类别特征提取过程中使用的间接聚类方法不仅可以实现对广告的多类别标注,而且有效地降低大规模稀疏特征向量的维度,显著降低聚类时间成本。
Click-through rate (CTR) of sponsored search, plays a major quantitative indicator role for search engine provider and advertiser. Therefore, CTR prediction is one of key areas in advertisement computation field. Scholars of industrial circle and academic circle have continuously studied CTR prediction. In addition, every search engine provider has established their own CTR prediction system. It follows that this subject is characterized by immense research significance and practical value.
     Centering round CTR prediction, this thesis carries out a complete method modeling. In the first place, it studies attributes of sponsored search of searching engines and summarizes its five major attributes. Then, on this basis, it defines both explicit features and implicit features of advertisements and extracts related features. It also introduces Probabilistic Relation Model into feature selection and classifies them into categories that are directly related, indirectly related or completely unrelated to real CTR values. Next, it introduces Factorization Machines as the prediction model whose input end is the real valued feature vector after feature selection. For the estimated CTR result, AUC (Area Under Curve) is used to generate the evaluation result.
     It is worth noting that, in current researches, feature selection mainly focuses on position and advertisement attributes without considering advertisement-triggered scenes and relationship between advertisement and user's query. The existing CTR prediction based on advertising category features method makes use of mean CRT value of advertisements of the same category for prediction without mixing them with other features to strengthen the prediction. A direct clustering of advertisements can also enable us to categorize advertisements, but the category of advertisement in this manner is unique and defined as uni-class feature of advertisement. Online advertisement itself has many themes, so uni-class label loses its significance in different user'query scenarios. Thus, this thesis proposes a CTR prediction method based on multi-class feature. After defined user's information retrieval activates advertisement, extract multi-class features of advertisement needs indirect clustering. Finally, Factorization Machines puts in multi-class features to predict CRT values. Findings show that advertisement multi-class features help increase CRT prediction accuracy apparently on the basis of primary fact features. Compared with direct clustering, indirect clustering in feature extraction helps realize multi-class label of advertisement reduces dimension of sparse eigenvector in great size and decreases time cost of clustering effectively.
引文
[1]第29次中国互联网络发展状况调查统计报告,CNNIC(中国互联网络信息中心)http://www.cnnic.net.cn/dtygg/dtgg/201201/t20120116_23667.html
    [2]阮云胜.基于SEO的搜索引擎营销探析[J].经营管理者.2010.
    [3]Iresearch Report. news.iresearch.cn/Zt/170809.shtml.2012.
    [4]在线广告http://en. wikipedia.org/wiki/Sponsored_Search
    [5]GoogleInvestorRelations.http://investor.google.com/financial/2011/tables.html
    [6]HamedSadeghiNeshat. Ranking of New Sponsored Online Ads[C].Applications of Digital Information and Web Technologies (ICADIWT). Stevens Point, WI.2011:7-12.
    [7]Click-through Rate.http://en.wikipedia.org/wiki/Click-through_rate
    [8]E.Burns.http://www.clickz.com/showPage.html?page=3550881SEMs Sees Optimization PPC.
    [9]A.Rowstron, D. Narayana, A.Donnelly, G.O'Shea, A.Douglas. Past:a large-scale, persistent peer-to-peer storage utility [C].HotCDP 2012,1st International Workshop on Hot Topics in Cloud Data Processing. Bern, Switzerland.2012. Pages:75-80.
    [10]K. Dembczynski, W. Kotlowski, D. Weiss. Predicting Ads Click-through Rate with Decision Rules[C].In Workshop on Targeting and Ranking in Online Advertising, volume 2008. Citeseer,2008.
    [11]T. Graepel, J.Q. Candela, T. Borchert,R. Herbrich. Web-scale Bayesian Click-through Rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine[C]. In Proc.27th InternatConference. on Machine Learning. Morgan Kaufmann, San Francisco, CA. Citeseer,2010.
    [12]M. Richardson, E. Dominowska,R. Ragno. Predicting Clicks:Estimating the Click-through Rate for New Ads[C]. In Proceedings of the 16th international conference on World Wide Web, WWW'07.2007.pages 521-530.
    [13]M. Regelson, D. Fain. Predicting Click-through Rate Using Keyword Clusters[C]. Proceeding In WWW'07 Proceedings of the 16th international conference on World Wide Web. New York, NY, USA.2007. Pages:521-530.
    [14]A.K.Menon,K.-P. Chitrapura,S.Garg,D.Agarwal,N.Kota. Response Prediction Using Collaborative Filtering with Hierarchies and Side-information[C]. In KDD'11, August 21-24,2011. San Diego,USA.2011.
    [15]G.Dupret, B.Piwowarski, A User Browsing Model to Predict Search Engine Click Data from Past Obervations[C]. SIGIR'08. Singapore.2008.Pages 331-338.
    [16]M. Ciaramita, V. Murdock, and V. Plachouras. Online learning from click data for sponsored search[C]. In WWW'08.2008. Pages: 227-236.
    [17]M.Gupta. Prediction click through rate for job listings[C]. In WWW 2009 Madrid.New York, NY, USA.2009.Pages 1053-1054.
    [18]Y.Zhang, B.J.Jansen, andA.Spink, Identification of factors predicting click through in web searching using neural network analysis[J]. Journal of the American Society for Information Science and Technology,2008(60):1-14.
    [19]C.-J.Wang and H.-H.Chen.Learning user behaviors for advertisements click prediction[C]. In SIGIR 2011 Workshop:Internet Advertising. Beijing, China. 2011.
    [20]A. Broder, M. Fontoura, V. losifovski, and L. Riedel. A semantic approach to contextual advertising[C]. In Proceedings of the 30th annual international conference on Research and development in information retrieval, SIGIR '07.New York, NY, USA.2007. Pages 559-566.
    [21]H.Cheng and E. Cantu-Paz. Personalized Click Prediction in Sponsored Search[C]. WSDM'10. New York, USA.2010.
    [22]A. Ashkan, C. L. A. Clarke, E. Agichtein, and Q. Guo.Estimating ad clickthrough rate through query intent analysis [C]. In Proceedings of the 2009 IEEEIWICIACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, WI-I AT'09.2009. pages 222-229.
    [23]Michael Jahrer,Andreas Toscher, Jeong-Yoon Lee, Jingjing Deng, Hang,Zhang, Jacob Spoelstra. Ensemble of Collaborative Filtering and Feature Engineered Models for Click Through Rate Prediction[C].2012 KDD Cup Workshop. Beijing,China.2012.
    [24]欧海鹰,吕廷杰.在线关键词广告研究综述:新的研究方向.管理评论2011(23).
    [25]Jiawei Han, MichelineKamber. Data Mining:Concepts and Techniques[M]. China Machine Press.
    [26]Steffen Rendle. Factorization Machines[C].ln Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010). Sydney, Australia.2010.
    [27]李丽丽.基于贝叶斯网络的电子商务信任模型研究[D].青岛理工大学.2011.
    [28]汪荣贵,沈明玉,偶春生.Bayes网络与关系模型的集成:概率关系模型[J].微电子学与计算机.2002(3):8-13.
    [29]T. Joachims, L. Granka, B. Pan, H. Hembrooke, G. Gay. Accurately interpreting clickthrough data as implicit feedback[C]. In Proceedings of ACM SIGIR 2005. ACM Press.New York, NY, USA.2005. Pages 154-161.
    [30]Kuai-Wei Wu, Chun-Sung Ferng, Chia-Hua Ho. A Two-Stage Ensemble of Diverse Models for Advertisement Ranking in KDD Cup 2012[C]. KDD Cup 2012 Workshop. Beijing, China.2012.
    [31]J. A. Hartigan and M. A. Wong. Algorithm AS 136:A K-Means Clustering Algorithm [J]. Journal of the Royal Statistical Society. Series C (Applied Statistics),1979(28):100-108.
    [32]Kevin Murphy. Bays Net Toolbox for Matlab. https://code.google.com/p/bnt/
    [33]Steffen Rendle. Factorization Machines with libFM[J]. ACM Transactions on Intelligent Systems and Technology (TIST). New York, NY, USA.2012(3):57.
    [34]Steffen Rendle, Zeno Gantner, ChristophFreudenthaler, Lars Schmidt-Thieme. Fast Context-aware Recommendations with Factorization Machines[C]. In Proceeding of the 34th international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2011). Beijing, China.2011.
    [35]ChristophFreudenthaler, Lars Schmidt-Thieme, Steffen Rendle. Bayesian Factorization Machines[C]. In Workshop on Sparse Representation and Low-rank Approximation, Neural Information Processing Systems (NIPS-WS). Granada, Spain.2011.
    [36]Steffen Rendle. Learning Recommender Systems with Adaptive Regularization[C]. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM 2012). Seattle.2012.
    [37]ROC. http://en.wikipedia.org/wiki/Reeeiveroperatingcharacteristic
    [38]KDD Cup 2012. http://www.kddcup2012.org/
    [39]Tom Fawcett. ROC Graphs:Notes and Practical Considerations for Researchers. 2004 Kluwer Academic Publishers. Printed in the Netherlands,2004.
    [40]Thorsten Joachims. Making large-scale support vector machine learning practical[M].MIT Press. Canbridge. MA,USA.1999. Pages 169-184.
    [41]Chin-Chung Chang and Chih-Jen Lin. Libsvm:A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST). May 2011.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700