An Efficient and Scalable Algorithm for Mining Maximal

详细信息查看全文

作者：Wael Zakaria Abd Allah (20)
Yasser Kotb El Sayed (20) (21)
Fayed Fayek Mohamed Ghaleb (20)
关键词：Data mining ; DNA microarray ; mining association rules ; closed itemsets ; row enumeration ; column enumeration ; maximal high confidence rules
刊名：Lecture Notes in Computer Science
出版年：2013
出版时间：2013
年：2013
卷：7988
期：1
页码：367-378
全文大小：621KB
参考文献：1. Stekel, D.: Microarray Bioinformatics. Cambridge University Press (2003)
2. Senthil Kumar, A.V.: Knowledge Discovery Practices and Emerging Applications of Data Mining: Trends and New Domains. In: InformatIon Science Reference (2011)
3. Wang, M., Shang, X.Q., Li, Z.H.: Strong Association Rules Mining without using Frequent Items for Microarray Analysis. In: The 3rd Int. Conf. on Bioinformatics and Biomedical Engineering (iCBBE 2009), pp. 978-84. IEEE, Beijing (2009)
4. Alves, R., Rodriguez-Baena, D.S., Aguilar-Ruiz, J.S.: Gene Association Analysis: a Survey of Frequent Pattern Mining from Gene Expression Data. Brief Bioinform (2010)
5. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th Int. Conf. on Very Large Data Bases (VLDB 1994), Santiago de Chile, Chile, pp. 475-86. Morgan Kaufmann (September 1994)
6. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann (July 6, 2011)
7. Piatetsky-Shapiro, G., Tamayo, P.: Microarray Data Mining: Facing the Challenges. SIGKDD Explor. Newsl.?5(2), 1- (2003) CrossRef
8. Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.-F., Gandrillon, O.: Strong Association Rule Mining for Large Gene Expression Data Analysis: a Case Study on Human SAGE Data. Genome Biology?12 (2002)
9. McIntosh, T., Chawla, S.: High Confidence Rule Mining for Microarray Analysis. IEEE/ACM TCBB?4(4), 611-23 (2007)
10. Cong, G., Tan, K.-L., Tung, A., Pan, F.: Mining Frequent Closed Patterns in Microarray Data. In: Proc. Fourth IEEE Int’l Conf. Data Mining (ICDM), vol.?4, pp. 363-66 (2004)
11. Zaki, M.J., Hsiao, C.: CHARM: An Efficient Algorithm for Closed Association Rule Mining. In: Proc. SIAM Int’l Conf. on Data Mining, SDM (2002)
12. Agrawal, R., Imielinski, T., Swami, A.N.: Mining Association Rules between Sets of items in Large Databases. In: Proc. of the 1993 ACM SIGMOD Int. Conf. on Management of Data, pp. 207-16 (1993)
13. Pan, F., Cong, G., Tung, K., Yang, J., Zaki, M.J.: Carpenter: Finding Closed Patterns in Long Biological Datasets. In: Proc. ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining (KDD), pp. 637-42 (2004)
14. Cong, G., Xu, X., Pan, F., Tung, A., Yang, J.: FARMER: Finding Interesting Rule Groups in Microarray Datasets. In: SIGMOD 2004 (2004)
15. Wang, J., Han, J., Pei, J.: CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets. In: Proc. ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, KDD (2003)
16. Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. In: ACM SIGMOD Conf. Management of Data (June 1998)
17. Agrwal, J., Ramesh, J.C.: Analysis of Gene Microarray Data using Association Rule Mining. Journal of Computing?4(1) (January 2012)
18. Hughes, T., et al.: Functional Discovery via a Compendium of Expression Profiles. Cell?102, 109-26 (2000) CrossRef
作者单位：Wael Zakaria Abd Allah (20)
Yasser Kotb El Sayed (20) (21)
Fayed Fayek Mohamed Ghaleb (20)

20. Faculty of Science, Mathematics/Computer Science Department-Abbassia, Ain Shams University, Cairo, Egypt
21. Information Systems Department, College of Computer and Information Sciences, Al-Imam Muhammad ibn Saud Islamic University, Riyadh, KSA

文摘

DNA microarrays allow simultaneous measurements of expression levels for a large number of genes within a number of different experimental samples. Mining association rules algorithms are used to reveal biologically relevant associations between different genes under different experimental samples. In this paper, we present a new mining association rules algorithm called Mining Maximal High Confidence Rules (MMHCR). The MMHCR algorithm is based on a column (gene) enumeration method which overcomes both the computational time and memory explosion problems of column-enumeration method used in many of the mining microarray algorithms. MMHCR uses an efficient data structure tree in which each node holds a gene’s name and its binary representation. The binary representation is beneficial in two folds. First, it makes MMHCR easily find all maximal high confidence rules. Second, it makes MMHCR more scalable than comparatives. In our experiments on a real microarray dataset, MMHCR attained very promising results and outperformed other counterparts.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700