网络论坛文章数序列的自相似性建模与预测
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着Web2.0概念的提出,高速变化的互联网正得到人们越来越多的关注。网络论坛、博客等沟通交流形式已经成为互联网应用的热点。一方面,在网络论坛研究中,话题热点分析是一个主要的研究方向;另一方面,由于各种商业活动决策(如产品市场调查或广告投放)的需要,宏观上了解论坛或博客用户参与度的变化正在成为一个新的研究热点,对论坛中文章数随时间变化规律的研究最近也得到了人们的关注。因此,网络论坛特性的精确测量和刻画,对分析、理解和仿真网络论坛的动态变化,对指导网络论坛控制方案的设计都具有基础性的意义。
     本文以网络论坛中的文章数为研究对象,分析研究文章数序列的特性,提出文章数序列的描述模型,并在此基础上提出网络论坛文章数序列的预测方法。首先,对实际数据进行分析,验证和证明文章数序列的自相似性特性。然后,用短相关特性和长相关特性的时间序列分析方法对自相似网络论坛文章数进行建模分析,给出了利用马尔可夫模型、FARIMA模型进行建模的方法和步骤。为了避免FARIMA模型在结构辨识和参数估计上的复杂性,通过分析ARMA模型的参数估计方法,提出了基于传统ARMA模型的改进模型λ-ARMA。通过对实际数据的实验,验证了FARIMA模型和λ-ARMA模型在文章数序列建模方面的适用性。
     建立模型的一个重要用途是进行预测,本文结合实际数据给出了预测的算法和步骤,并通过实验验证了不同模型在不同置信区间内的适用性。
     本文的研究结果表明从文章数的角度分析网络论坛是必要的而且可行的。本文的工作将为下一代网络,尤其是网络论坛、博客等的构建和管理提供参考。
With the introduction of the concept Web2.0, high-speed Internet gets more and more attention. Online forums, blogs and other forms of communication have become hot Internet applications. On the one hand, the topic is a major focus of web forum research; the other hand, due to various business decisions (such as product market research or advertising), the forum or blog user participation is becoming a new hotspot, the number of articles on the forum has also recently received attention. Accurate measurement of dynamic changes is essential for analysis, understanding and simulation in online forums, and also has a fundamental significance to guide the design of control scheme.
     In this paper, the number of articles in Online Forum is to be studied. This paper describes several models. And on that basis, this paper raised the method for forecasting the sequence of the number of articles. First, through analysis of actual data, this paper validates and certificates the self-similarity of article number sequence. Then, the short and long correlation time series analysis methods are applied to the modeling and analysis of self-similarity. This paper gives the modeling method and procedures of Markov models, FARIMA model. In order to avoid the complexity of structure identification and parameter estimation on the FARIMA model,λ-ARMA model is proposed based on the traditional ARMA model. Experiments on real data verify the applicability of FARIMA model and X-ARMA model.
     An important use of modeling is to predict, this paper gives the prediction algorithm and steps. Through experimental verification of different models, this paper validates the different applicability within the confidence interval.
     The study results show that from the perspective of the number of articles are necessary and feasible. This work will provide references for the next generation of networks, especially in online forums, blog.
引文
[1]中国互联网络信息中心,第25中国互联网络发展状况统计报告[EB/OL].:http://www.cnnic.cn/html/Dir/2010/01/15/5767.htm,2010-1-15.
    [2]联合国教科文组织.关世杰译.世界文化报告(1998)[M].北京:北京大学出版社,2000,192.
    [3]B.B.Mandelbrot.The Fractal geometry of nature[M]. New York:W.H. Freeman and Co., 1983:495.
    [4]自相似性.http://baike.baidu.com/view/1319784.htm.
    [5]相似性.http://www.hudong.com/wiki/%E7%9B%B8%E4%BC%BC%E6%80%A7.
    [6]分形.http://baike.baidu.com/view/83243.htm
    [7]马书南.网络流量自相似特性的研究[D].无锡:江南大学,2006.
    [8]曾凡斌.重大突发事件中的BBS舆论特点与管理初探-对人民网“强国论坛”的个案观察[J].出版发行研究,2006,4:61-67.
    [9]程葳,钟华,孙娇华.网络论坛中发帖行为复杂性研究[J].系统工程学报.2009,24(4):385-391.
    [10]陈世武.论网络论坛传播的的特点及产生的影响[J].法制与社会.2010年2月(上):181.
    [11]Matsumura, N., Ohsawa, Y. and Ishizuka, M..Influence Diffusion Model in Text-based Communication [J]. Journal of Japan Society of Artificial Intelligence, 2002,(17) 3:259-267.
    [12]Matsumura, N., Goldberg, D. E., and Llora, X.. Mining Directed Social Network from Message Board[A].In:Proc.14th International World Wide Web Conference (WWW 2005)[C].Chiba, Japan:2005:1092-1093.
    [13]Matsumura, N., Miura, A., Shibanai, Y., Ohsawa, Y, and Nishida, T.. The Dynamism of 2channel[J]. Journal of AI & Society,2005,(19)1:84-92, Springer Verlag.
    [14]Keitaro Naruse, Masao Kubo. Lognormal Distribution of BBS Articles and its Social and Generative Mechanism[A]. In:Proc. International Conference on Web Intelligence (WI 2006)[C]. Hong Kong, China:2006:103-112.
    [15]中国互联网络信息中心,第22中国互联网络发展状况统计报告[EB/OL].:http://www.cnnic.net/uploadfiles/pdf/2008/7/23/170516.pdf,2008-7-23.
    [16]Leland W.E.,Taqqu M.S., Willinger W., Wilson D.V.. On the self-similar nature of Ethernet traffic (extended version)[A]. IEEE/ACM Transactions on Networking [C]. 1994:1-15.
    [17]V.Paxson, S.Flyd. Wide area traffic:the failure of Possion process[J]. IEEE/ACM.Trans.On Networking[C].1995,3(3):226-244.
    [18]M.Crovella and A.Bestavros.Self-similarity in World Wide Web traffic:evidence and possible cause[J].In Proceedings of the 1996 ACM SIGMETRUCS International Computer Systems[C],1996.
    [19]W.Willinger, M.Taqqu, R.Sherman et al. Self-similarity through high variability:statistical analysis of Ethernet LAN traffic at the source level[J].IEEE/ACM Trans.On Networking,1997,5(l):71-86.
    [20]I.Norros. A Storage Model with Self-Similar Input[J]. Queueing Systems,1994,16:387-396.
    [21]M.Grossglauser,J.C.Bolot. On the Relevance of Long-rang Dependence in Network Traffic. Proc of ACM SIGCOMM'96,1996,15-24.
    [22]Grovella M, Bestavros A. Self-similarity in World Wide Web traffic:evidence and possible causes. In Proceedings of the 1996 ACM SIGETRICS International Conference on Measurement and Modeling of Computer Systems, May 1996.
    [23]Kumar A. Comarative performance analysis of versions of TCP in a local network with a lossy link. IEEE/ACM Trans on networking,1998,6(4):485-498.
    [24]Feldmann A, Gilber A.C,Huang P,Willinger W. Dynamics of IP traffic:A study of the role of variability and the impact of control. Proceeding of ACM SIGCOMM, Boston, MA, August 1999.
    [25]Erramilli A, Narayano 0, Neidhardt A, Sainee I. Performance impacts of multi scaling in wide area TCP/IP traffic. Proceedings of IEEE INFOCOM, Tel AviV, Israel March2000:352-259.
    [26]Erramilli A, Narayan O, Willinger W. Data networks as cascades:Investigation the multifractal nature of Internet WAN traffic. Computer Communications Review,1998, 28(4):42-58.
    [27]周四根.网络论坛的舆论监督机制研究[D].湘潭:湘潭大学,2009.
    [28]Zeng Jianping, Zhang Shiyong. Predictive Model for Internet Public Opinion[A]. Proc. of International Conference on Fuzzy System and Knowledge Discovery[C].2007:7-11.
    [29]Mei Qiaozhu, Xu Ling, Wondra M, et al. Topic Sentiment Mixture:Modeling Facets and Opinions in Weblogs[A]. Proc. of International Conference on World Wide Web[C]. New York, USA:ACM Press,2007:171-180.
    [30]曾剑平,张世永.网络论坛的自相似性及其模型[J].计算机工程,2009,35(6):63-65.
    [31]Hosking J. Fractional differencing[J]. Biometrika,1981,68(1):165-176.
    [32]Shu Yantai, Jin Zhigang, Zhang Lianfang. Traffic prediction using FARIMA models[A]. IEEE International Conference on Communications[C],1999:891-895.
    [33]Feng Ding. Generalized Yule-walker and two-stage identification algorithms for dual-rate systems[J]. Journal of Control Theory and Applications,2006,4:338-342.
    [34]刘华.随机过程(第二版)[M].武汉:华中科技大学出版社,2001:36-65.
    [35]张连芳,薛飞,王宙等.自相似网络业务的一个FARIMA模型[J].计算机研究与发展,2000,37(9):1138-1144.
    [36]刘嘉煙,金志刚,薛飞,舒炎泰.基于FARIMA过程的网络业务预报与应用[J].电子信息学报,2001,23(4):403-407.
    [37]舒炎泰,王雷,张连芳,薛飞,金志刚,Oliver Yang.基于FARIMA模型的Internet网络业务预报[J].计算机学报,2001,24(1):46-53.
    [38]安德洪,丁春蕾,刘嘉煜,许树荆.一类基于FARIMA过程的电梯导轨振动模型[J].机械设计,2004,21(4):31-41.
    [39]程婷,吴援明,李乐民,高乐.网络视频业务流有效带宽的计算[J].信号处理,2006,22(4):467-470.
    [40]林青家,陈涤,刘允才.短相关特性对网络流量的测量与分析方法性能的影响与改进[J].电子测量与仪器学报,2006,20(5):92-97.
    [41]林青家,陈涤,刘允才.网络流量长相关特性的估值算法的性能分析[J].山东大学学报(理学版),2005,40(1).
    [42]张爱萍.基于仿真的网络流量分形维数的估计与分析[J].计算机与现代化,2006,5:12-14.
    [43]饶云华,曹阳,杨艳,王习稳.基于FARIMA模型的网络排队性能分析[J].计算机工程,2006,32(23):13-20.
    [44]谭晓玲,许勇,张凌,等.网络流量短期预测方法的研究与应用[J].计算机工程与设计,2006,8:1341-1343.
    [45]胡玉清,谭献海,宋正阳.基于FARIMA的网络建模与性能分析[J].计算机工程与设计,2008,29(18):4666-4714.
    [46]胡广书.数字信号处理[M].北京:清华大学出版社,2003:342-372.
    [47]李士宁,闫焱,覃征.互联网流量与两个相关模型[J].无线通信技术,2005,14(4):44-51.
    [48]李士宁,闫焱,覃征.基于FARIMA模型的网络流量预测[J].计算机工程与应用,2006,29:148-150.
    [49]岳欣,杨莘元,李一兵.基于ARMA模型的分数布朗运动建模[J].系统工程与电子技 术,2006,28(12):1906-1908.
    [50]王晓瑛.分数布朗运动的新表示[J].纯粹数学与应用数学,2002,18(4):367-370.
    [51]邬源杨,董玮文,杨宇航.自相似网络流量的长相关分析方法[J].计算机工程,2003,29(5):112-159.
    [52]Zhongwu Zhai, Hua Xu, Peifa Jia. Identifying Opinion Leaders in BBS[A]. Proc.2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology[C].2008:398-401.
    [53]程葳,钟华,孙娇华.网络论坛中发帖行为复杂性研究[J].系统工程学报,2009,24(4):385-391.
    [54]白淑英,何明升.BBS互动的结构与过程[J].社会学研究,2005,20(5):8—18.Bai Shu-ying, He Ming-sheng. Structure and process of BBS interaction[J]. Sociology Research,2005,20(5):8-18. (in chinese)
    [55]汪志勇,邱晓红.基于多重分数差分和AR模型的网络流量预测[J].计算机技术与发展,2009,19(3):84-86.
    [56]胡申敏,许维胜,王中杰,余有灵.基于分数差分和Fuzzy-AR的网络流量建模和预测[J].计算机工程与应用,2006.19:104-107.
    [57]丁志中,易茂祥.自相关函数估值的快速算法[J].安徽师大学报(自然科学版),1996,19(4):364-378.
    [58]毕会卿.自相关系数p的估计方法研究[J].中国科技信息,2005,16:31-36.
    [59]吴涛,萧德云,刘震涛.汇率时间序列的长记忆性分析及其建模[J].计算机工程与应用,2004,36:205-207.
    [60]陈彦辉,谢维信.随机分形信号参数的分数差分估计[J].电子与信息学报,2001,23(1):9-15.
    [61]金秀,姚瑾,庄新田.基于分数阶差分的ARFIMA模型及预测效果研究[J].数理统计与管理,2007,26(5):896-907.
    [62]姚赛芬.自相似及其对网络性能影响的研究[D].成都:西南交通大学,2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700