Knowledge discovery from relations.

详细信息

作者：Guo ; Zhen.
学历：Ph.D.
年：2010
导师：Zhang, Zhongfei,eadvisorCutler, Michalecommittee memberFaloutsos, Christosecommittee memberMeng, Weiyiecommittee memberYu, Philip S.ecommittee member
毕业院校：State University of New York
Department：Computer Science
ISBN：9781124021997
CBH：3404801
Country：USA
语种：English
FileSize：2916328
Pages：160

文摘

A basic and classical assumption in the machine learning research area is randomness assumption also known as i.i.d assumption), which states that data are assumed to be independent and identically generated by some known or unknown distribution. This assumption, which is the foundation of most existing approaches in the literature, simplifies the complex conditions in the real world problems and makes it attractable to obtain the solutions. The real world problems, however, very often violate this assumption in the sense that the examples are related to each other in certain ways. For example, in a collection of scientific articles, the articles are related to each other through citations and they are not independent of each other. Therefore, those learning approaches based on the i.i.d assumption only perform effectively on the problems approximately) satisfying the i.i.d assumption and the effectiveness depends on the goodness of the approximation. In these existing approaches, the relations among data are totally ignored such that the underlying factors that are responsible for generating data cannot be fully captured and discovered. The problem of learning from relational data has been receiving more and more attention recently because the rapid development of the Internet has made available huge repositories such as digital libraries) online, where one of the most important properties is that objects in the repositories are interdependent on each other. In order to accurately capture the intrinsic characteristics of real world problems, one needs to incorporate the relations into the learning process. The relations among data can be categorized into two types: homogeneous relations between objects of the same type and heterogeneous relations between objects of different types. For example, in an image database in which each image has a few words given as annotation, the relations between the words are homogeneous relations and the relations between the images and the words are heterogeneous relations. Moreover, the homogeneous relations can be generalized as the relations between two groups of objects of the same type, which is called the general homogeneous relation. For example, given two subsets of the whole data where no explicit relations are observed between them, the implicit relations still exist in the sense that they both follow the same distribution. In other words, the existence of one subset implies a high probability of the existence of another subset. Thus, this kind of homogeneous relations represents the dependence between the probability densities of two groups of data. Therefore, these various explicit and implicit relations present huge challenges to the classical i.i.d assumption, meanwhile potential benefits are made possible by incorporating the relations into learning processes. This dissertation is dedicated to the problem of incorporating the above relations into learning processes in order to better approximate the underlying characteristics of problems. Specifically, the focus of this dissertation is on developing systematic machine learning approaches for different relational data available in various data mining tasks including supervised learning, unsupervised learning, and semi-supervised learning. The proposed approaches have been applied to developing the advanced data mining and knowledge discovery tools in data mining and information retrieval and the extensive experimental comparisons with state-of-the-art methods demonstrate very promising knowledge discovery capabilities in reality.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700