摘要
在分析了传统的主题模型后提出了一种基于LDA的航线潜在价值挖掘模型。该模型将旅客出行行为的分析划分成两个阶段,出行意图的确定及出行意图下航线的选择,并与旅客价值进行融合来挖掘航线的潜在价值。出行意图采用Gibbs sampling方法从旅客出行记录中获取,航线则在旅客确定出行意图后由出行意图的航线向量获得,旅客价值则结合出行中的舱位信息进行提取。在中国民航旅客订票数据集上的实验表明,本文模型在2010年和2011年两个数据集上获得的两组航线潜在价值序列比pLSI模型和senLDA模型获得的两组航线潜在价值序列都拥有更好的有序相关性,且在挖掘排名前5的航线潜在价值时,本文模型在该两个数据集上获得了两组完全一致的航线潜在价值序列,表明其在挖掘高潜在价值航线方面的优势。
Aiming at the problem of the value of air routes in the civil aviation route network,this paper proposes an air routes potential value mining model based on LDA by analyzing the traditional theme model.This model divides the analysis of passengers' travel behavior into two stages:The determination of travel intentions,and the selection of air routes implied in travel intentions,and they are incorporated with passenger value to mine air routes potential value.Travel intentions are extracted from passenger booking data by Gibbs sampling method,and air routes are obtained from the air routes vector through determining the travel intentions of passengers.Passenger values are obtained from the information of cabin.Experiments on passenger booking data sets of China Civil Aviation in 2010 and 2011 respectively show that the two air routes potential value sequences obtained by proposed model have better orderly correlation than the pLSI model and senLDA model,and when mining the potential value of the top 5 air routes,we get two identical air routes potential value sequences on two data sets in 2010 and 2011.Therefore,the proposed model has superiority in mining high potential value of air routes.
引文
[1] CHEN L,HOMEM-DE-MELLO T.Resolving stochastic programming models for airline revenue management[J].Annals of Operations Research,2010,177(6):91-114.
[2] WAN Y,GAO Q.An ensemble sentiment classification system of twitter data for airline services analysis[C]//2015IEEE International Conference on Data Mining Workshop(ICDMW).Atlantic City:IEEE,2015:1318-1325.
[3] SUKI N M.Passenger satisfaction with airline service quality in malaysia:A structural equation modeling approach[J].Research in Transportation Business&Management,2014,10(4):26-32.
[4] LORDAN O,SALLAN J M,SIMO P.Study of the topology and robustness of airline route networks from the complex network approach:A survey and research agenda[J].Journal of Transport Geography,2014,37(8):112-120.
[5] JIANG C,ZHANG A.Airline network choice and market coverage under high-speed rail competition[J].Transportation Research Part A:Policy and Practice,2016,92(10):248-260.
[6]潘玲玲,张育平,徐涛.核DBSCAN算法在民航客户细分中的应用[J].计算机工程,2012,38(10):70-73.PAN Lingling,ZHANG Yuping,XU Tao.Application of kernel DBSCAN algorithm in civil aviation customer segmentation[J].Computer Engineering,2012,38(10):70-73.
[7]冯霞,徐冰宇,卢敏.民航旅客订票行为细分及群体特征分析[J].计算机工程与设计,2015,36(8):2217-2222.FENG Xia,XU Bingyu,LU Min.Booking behavior subdivision and characteristics analysis of civil aviation passenger[J].Computer Engineering and Design,2015,36(8):2217-2222.
[8] BLEI D M, NG A Y,JORDAN M I.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022.
[9] DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[10]HOFMANN T.Probabilistic latent semantic analysis[C]//The Fifteenth Conference on Uncertainty in Artificial Intelligence.San Francisco:Morgan Kaufmann Publishers Inc,1999:289-296.
[11]YOHAN J,OH A H.Aspect and sentiment unification model for online review analysis[C]//ACM International Conference on Web Search and Data Mining.New York:ACM,2011:815-824.