Frequent pagesets from web log by enhanced weighted association rule mining

详细信息查看全文

作者：S. P. Malarvizhi ; B. Sathiyabhama
关键词：Frequent pattern mining ; Weight estimation ; Weighted minimum support ; WARM ; T+weight tree ; Web logs
刊名：Cluster Computing
出版年：2016
出版时间：March 2016
年：2016
卷：19
期：1
页码：269-277
全文大小：1,075 KB
参考文献：1.Zhao, Q., Bhowmic, S.S.: Association Rule Mining: A Survey Technical Report, CAIS, Nanyang Technological University, Singapore. No. 2003116 (2003)
2.Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2004)MATH
3.Tao, F., Murtagh, F., Farid, M.: Weighted association rule mining using weighted support and significance framework, SIGKDD 2003
4.Wang, H., Yang, C., Zeng, H.: Design and implementation of a web usage mining model based on upgrowth and prefixspan. Commun. IIMA 6(2), 71–86 (2006)
5.Chitraa, V., Davamani, D., Selvdoss, A.: A survey on preprocessing methods for web usage data. Int. J. Comput. Sci. Inf. Secur. 7(3), 78–83 (2010)
6.Mishra, R., Choubey, A.: Discovery of frequent patterns from web log data by using FP growth algorithm for web usage mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(9), 311–318 (2012)
7.Wang, W., Yang, J., Yu, P.: Efficient mining of weighted association rules (WAR), In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. pp. 270–274 (2000)
8.Sun, L., Zhang, X.: Efficient frequent pattern mining on web logs, In: APweb 2004, LNCS 3007, pp. 533–542. Springer, Berlin (2004)
9.Srivastava, A., Bhosale, A., Sural, S.: Speeding up web access using weighted association rules. PReMI 2005. Lecture Notes in Computer science, vol. 3776, pp. 660–665. Springer, Berlin (2005)
10.Iváncsy, R., Vajk, I.: Frequent pattern mining in web log data. Acta Polytech. Hung. 3(1), 77–90 (2006)
11.Sun, K., Bai, F.: Mining weighted association rules without preassigned weights. IEEE Trans. Knowl. Data Eng. 20(4), 489–495 (2008)CrossRef
12.Yang, Y., Guan, X., You, J.: Enhanced Algorithm for Mining the Frequently Visited Page Groups, Shanghai Jiaotong University, Shanghai
13.Velvadivu, P., Duraisamy, K.: An optimized weighted association rule mining on dynamic content. Int. J. Comput. Sci. Issues 7(2), 16–19 (2010)
14.Kewen, L: Analysis of preprocessing methods for web Usage Data, In: 2012 International conference on measurement, Information and Control (MIC), School of Computer and Information Engineering, Harbin University of Commerce, Harbin
15.Malarvizhi, S.P., Sathiyabhama, B.: Enhanced reconfigurable weighted association rule mining for frequent patterns of web logs. Int. J. Comput. 13(2), 97–105 (2014)
16.Matthew, M.: ASP.NET The Complete Reference. Tata Mcgraw Hill Education Private. Ltd., Berkeley (2002)
17.Tao, F., Murtagh F., Farid, M: Weighted Association Rule Mining using Weighted Support and Significance Framework, In: SIGKDD (2003)
18.Kumar, P., Ananthanarayana, SV.: Discovery of Weighted Association Rules Mining, 978-1-4244-5586-7/10/${\$}$ 26.00 C 2010 IEEE, vol. 5, pp.718–722
作者单位：S. P. Malarvizhi (1)
B. Sathiyabhama (2)

1. Anna University, Chennai, Tamil Nadu, India
2. Sona College of Technology, Salem, India
刊物类别：Computer Science
刊物主题：Processor Architectures
Operating Systems
Computer Communication Networks
出版者：Springer Netherlands
ISSN：1573-7543

文摘

Mining frequently visited web pages from web logs have become an imminent need for web usage mining to understand the behavior of users. Frequent pageset mining and association rule mining (ARM) algorithms existing in the literatures suffer from storage and run time issues. It is because these algorithms mine all of the frequent pagesets based on minimum support threshold and all possible association rules based on minimum confidence threshold. Hence for analyzing the usage level of the web, a more quality oriented and useful mining can be performed by means of weighted ARM (WARM) on web logs. WARM in fact reduces the storage and run time, as it mines the frequent pages based on weighted support and association rules based on weighted confidence. Proposed T+weight tree algorithm gives importance to the dwelling time of the pages visited by the users. Pages are assigned with weights based on dwelling time which shows that these pages may have some significance and attracted the users’ interest. T+weight tree algorithm finds frequent pagesets based on weights in a single scan of the database. Empirical results show that, proposed T+weight tree method takes lesser computational time than the other methods in the literature because it produces lesser number of more significant pagesets.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700