一个有效的基于GBRT的早期电影票房预测模型
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Effective box-office revenue prediction model based on GBRT
  • 作者:韩忠明 ; 原碧鸿 ; 陈炎 ; 赵宁 ; 段大高
  • 英文作者:Han Zhongming;Yuan Bihong;Chen Yan;Zhao Ning;Duan Dagao;College of Computer & Information Engineering,Beijing Technology & Business University;Beijing Key Laboratory of Big Data Technology for Food Safety;
  • 关键词:梯度回归树(GBRT) ; 电影早期因素 ; 电影票房预测 ; 影响力度量
  • 英文关键词:gradient boosting regression tree;;early factors of film;;box-office prediction;;influence measurement
  • 中文刊名:JSYJ
  • 英文刊名:Application Research of Computers
  • 机构:北京工商大学计算机与信息工程学院;食品安全大数据技术北京市重点实验室;
  • 出版日期:2017-08-18 17:02
  • 出版单位:计算机应用研究
  • 年:2018
  • 期:v.35;No.316
  • 基金:国家自然科学基金资助项目(61170112);; 北京市自然科学基金资助项目(4172016);; 国家教育部人文社会科学研究基金资助项目(13YJC860006)
  • 语种:中文;
  • 页:JSYJ201802021
  • 页数:7
  • CN:02
  • ISSN:51-1196/TP
  • 分类号:96-102
摘要
电影票房预测是一个具有挑战性的问题,尤其是早期预测电影票房。基于社会媒体等预测方法存在准确度低、难以早期预测等问题,提出了一种基于GBRT模型的早期电影票房预测模型。对影响电影票房的因素进行特征化处理,选择包括演员、导演、上映日期以及公司等在内的九种因素,分别采用社会网络节点影响力度量法、平均票房权重区间化等不同的特征化方法;然后,生成34个特征作为影响电影票房的因变量,对特征与电影票房建立GBRT模型。选择2000—2015年间的1 875部电影以及相应的8 203名影人和3 300家公司进行了大量实验,实验结果表明该模型具有良好的预测效果,相对准确率达到80.6%,对部分2016年新电影进行预测,其误差在10%以内。
        It's a challenging subject to predict box-office of movies especially at the early stage. Concerning the limitations that accuracy of prediction is low and prediction of movie is difficult at the early stage,this paper proposed a model to predict box-office of movie based on the gradient boosting regression tree( GBRT). The model characterized the influencing factors of box-office such as actors,directors,release dates,the companies and so on by using different methods,which included influence measuring of social network nodes,ranging the average weight,etc. Next,it created a GBRT model with dependent variables including thirty-four features. Experimental dataset was collected 1 875 movies,8 203 players and 3 300 companies ranging from2000 to 2015. The experimental results demonstrate the model can predict box-offices effectively,relative accuracy is up to80. 6%,and the error that predicting box-office of new movies in 2016 is less than 10%.
引文
[1]Goldman W.Adventures in the screen trade:a personal view of Hollywood and screenwriting[M].New York:Warner Books,1985.
    [2]Eliashberg J,Elberse A,Leenders M A A M.The motion picture industry:critical issues in practice,current research,and new research directions[J].Marketing Science,2006,25(6):638-661.
    [3]Eliashberg J,Hui S K,Zhang Z J.From story line to box office:a new approach for green-lighting movie scripts[J].Management Science,2007,53(6):881-893.
    [4]Eliashberg J,Hui S K,Zhang Z J.Assessing box office performance using movie scripts:a kernel-based approach[J].IEEE Trans on Knowledge and Data Engineering,2014,26(11):2639-2648.
    [5]Ghiassi M,Lio D,Moon B.Pre-production forecasting of movie revenues with a dynamic artificial neural network[J].Expert Systems with Applications,2015,42(6):3176-3193.
    [6]Goetzmann W N,Ravid S A,Sverdlove R.The pricing of soft and hard information:economic lessons from screenplay sales[J].Journal of Cultural Economics,2013,37(2):271-307.
    [7]Gopinath S,Chintagunta P K,Venkataraman S.Blogs,advertising,and local-market movie box office performance[J].Management Science,2013,59(12):2635-2654.
    [8]Grbic D,Hafferty F W,Hafferty P K.Medical school mission statements as reflections of institutional identity and educational purpose:a network text analysis[J].Academic Medicine,2013,88(6):852-860.
    [9]Nelson R A,Glotfelty R.Movie stars and box office revenues:an empirical analysis[J].Journal of Cultural Economics,2012,36(2):141-166.
    [10]Sharda R,Delen D.Predicting box-office success of motion pictures with neural networks[J].Expert Systems with Applications,2006,30(2):243-254.
    [11]Sharda R,Delen D.How to predict a movie’s success at the box office[J].Foresight:The International Journal of Applied Forecasting,2006(5):32-36.
    [12]Basheer I A,Hajmeer M.Artificial neural networks:fundamentals,computing,design,and application[J].Journal of Microbiological Methods,2000,43(1):3-31.
    [13]Delen D,Sharda R.Predicting the financial success of Hollywood movies using an information fusion approach[J].Industrial Engineering Journal,2010,21(1):30-37.
    [14]郑坚,周尚波.基于神经网络的电影票房预测建模[J].计算机应用,2014,34(2):742-748.
    [15]Derrick F W,Williams N A,Scott C E.A two-stage proxy variable approach to estimating movie box office receipts[J].Journal of Cultural Economics,2014,38(2):173-189.
    [16]Neelamegham R,Chintagunta P.A Bayesian model to forecast new product performance in domestic and international markets[J].Marketing Science,1999,18(2):115-136.
    [17]Sawhney M S,Eliashberg J.A parsimonious model for forecasting gross box-office revenues of motion pictures[J].Marketing Science,1996,15(2):113-131.
    [18]Mestyán M,Yasseri T,Kertész J.Early prediction of movie box office success based on Wikipedia activity big data[J].PLo S One,2012,8(8):e71226.
    [19]周明升,韩冬梅.基于社交媒体用户评论和关注度的电影票房预测模型[J].微型机与应用,2014,33(4):73-75.
    [20]Doshi L.Using sentiment and social network analyses to predict opening-movie box-office success[D].Massachusetts:Massachusetts Institute of Technology,2010.
    [21]Amatriain X,Pujol J M.Data mining methods for recommender systems[M]//Recommender Systems Handbook.[S.l.]:Springer,2015:39-71.
    [22]王炼,贾建民.基于网络搜索的票房预测模型——来自中国电影市场的证据[J].系统工程理论与实践,2014,34(12):3079-3090.
    [23]Saraee M,White S,Eccleston J.A data mining approach to analysis and prediction of movie ratings[C]//Proc of the 5th International Conference on Data Mining,Text Mining and their Business Applications.Malaga:Wessex Institute Press,2004:15-17.
    [24]Litman B R,Kohl L S.Predicting financial success of motion pictures:The 80’s experience[J].Journal of Media Economics,1989,2(2):35-50.
    [25]Lee K J,Chang W.Bayesian belief network for box-office performance:a case study on Korean movies[J].Expert Systems with Applications,2009,36(1):280-291.
    [26]Marshall P,Dockendorff M,Ibá1ez S.A forecasting system for movie attendance[J].Journal of Business Research,2013,66(10):1800-1806.
    [27]Kitsak M,Gallos L K,Havlin S,et al.Identification of influential spreaders in complex networks[J].Nature Physics,2010,6(11):888-893.
    [28]韩忠明,陈炎,李梦琪,等.一种有效的基于三角结构的复杂网络节点影响力度量模型[J].物理学报,2016,65(16):289-300.
    [29]Friedman J H.Stochastic gradient boosting[J].Computational Statistics&Data Analysis,2002,38(4):367-378.