用户名: 密码: 验证码:
Relational clustering and its applications in text mining and bioinformatics.
详细信息   
  • 作者:Shen ; Chengcheng.
  • 学历:Doctor
  • 年:2010
  • 导师:Liu, Ying,eadvisor
  • 毕业院校:The University of Texas
  • ISBN:9781124464350
  • CBH:3441853
  • Country:USA
  • 语种:English
  • FileSize:11425263
  • Pages:162
文摘
Cluster analysis or clustering is a fundamental problem in data mining and machine learning and has wide applications in biology/medicine, market research, social network analysis, search result grouping etc. It is to group data objects to different classes or clusters so that data objects within a cluster are more similar than data objects in the other clusters. Most traditional clustering techniques require data objects to have same length attribute vector or be of the same type. However, the majority of data regularly acquired by human activities are relational in nature, which consists of multiple types of data objects, relations between data objects and data attributes. Relational data have been involved in many important applications such as text analysis, bioinformatics, movie recommendation system and weblog mining. In this dissertation, we focus on co-clustering and high-order relational clustering. We present a co-clustering algorithm called Regularized Co-Clustering on Manifold RCCM). RCCM imposes smoothness condition to bipartite graph and preserves local geometry structure. The objective function can be optimized by spectral relaxation, so RCCM can be regarded as belonging to category of spectral clustering approaches. We also propose an Orthogonal NMF Relational Clustering algorithm ONRC) for high-order relational clustering based on normalized cut and existing orthogonal NMF. The proofs of algorithmic convergence and correctness are provided. This is a direct extension of NMF-based co-clustering. ONRC exploits successive updates to minimize the matrix approximation error and also keeps the orthogonality of indicator matrices. The performance of both algorithms is compared to that of currently available algorithms with real text data and shows their effectiveness. With the development of research in life science, a large amount of data is accumulated in this field. We construct a tripartite graph composed of miRNAs, genes and related diseases and utilize a tripartite clustering algorithm on the graph to extract useful information from miRNA-gene-disease clusters. We evaluate the biological significance of these tripartite clusters including gene function enrichment analysis, cluster density analysis and a case study of a miRNA family etc. The analysis validates the clustering of the tripartite graph.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700