     (2)多关系社会网络的实体解析。从多个数据源中收集到的数据,只有经过集成和预处理才能被精确的知识发现模型所使用。而在多个数据源的数据进行集成合并到同一个数据集合当中时,会产生很多的重复记录。而这些数据并不是语义上唯一的,通常表示的是同一个实体。正确的合并这些重复的数据是制造高质量数据的至为重要的一部。这个过程被称之为实体解析(entity resolution),本文尝试在使用属性匹配的基础上,通过使用多关系社会网络多关系的特点,提升实体解析的准确率。
Traditional data mining technologies, including classification, clustering, association rules, etc, focus on analysis of the properties of dimension tables, but ignore the relationship that exists between the records. On the other hand, now the main method of network analysis focuses on the network topology analysis, which did not notice that the node in the networks has the attribute. In this paper, we use multi-relation social network (MRSN) to model the the raw data and do some research on MRSN.
     In this paper, we do some research on MRSN as following:
     (1) Multi-relation social network modeling and network extraction. We propose the process of modeling the multi-relation social network from the raw data, and then define the operators of extracting homogeneous networks from a multi-relation social network.
     (2) Entity resolution in MRSN. Data from relevant sources must be collected, integrated, scrubbed and pre-processed in a variety of ways before accurate models can be mined from it. When data from multiple databases is merged into a single database, many duplicate records often result in. These are records that, while not syntactically identical, represent the same real-world entity. Correctly merging these records and the information is an essential step in producing data of sufficient quality for mining. In this paper, we propose a method which combines link analysis on the basis of the attribute-match method.
     (3) Community detection is an important method to analyze complex networks. The current community detection algorithms merely use the topology structure of the network, but neglect the content of the node. In this paper, we propose an algorithm called CDNA which use not only the topology information but also the content of node to find the communities in the network.
     (4) Visualization, which provides interative software systems to help analyst explore and understand the data, is an important step of the data mining process. This article also researches the visualization of multi-relational social network. We design different views for different type of networks. And we put forward the "Web browser" concept, and use it to a construct a large-scale Web browsing framework.
     (5) Finally, the above research result are applied to develop a literature visual analytic system called LiterMiner, which is supported by a project called "Sicence and Techonolgy Information Service System key technology research and application demonstration," under national science and technology fund.
