基于商空间理论的海量信息检索模型的研究

英文题名：Study on Massive Information Retrieval Model Based on Quotient Space Theory
作者：陈圣兵
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：海量信息 ; 层次检索 ; 商空间 ; 文档分类 ; 用户兴趣挖掘
英文关键词：massive information ; hierarchical retrieval ; quotient space ; document classification ; user interest mining
学位年度：2010
导师：李龙澍
学科代码：081203
学位授予单位：安徽大学
论文提交日期：2010-04-01

摘要

随着计算机的广泛使用和Internet的迅速发展,我们所拥有的信息规模以几何速度爆炸式增长。对于海量的信息资源的检索,有两个急需解决的问题：其一,如何准确地从海量数据中检索出真正有用的信息,而不需要用户从一大堆反馈结果里人工查找；其二,如何实现一种高效的检索方法,对海量信息进行快速检索。以此为背景的海量信息检索技术引起了人们的极大关注,成为当前信息检索领域的主要研究课题之一。
     商空间理论借鉴人类多层次多粒度观察和分析问题的方法,将不同粒度世界的结构与数学领域的集合和空间统一起来,建立对象模型,求解实际工程中的复杂问题。从更粗的粒度观察和分析问题,可以使得问题简单化,加快求解速度,特别适合于解决大规模复杂问题。本文以海量信息库为研究对象,商空间理论为工具,研究基于商空间理论的海量信息检索问题。主要研究内容及创新点包括以下几个方面：
     (1)深入研究商空间理论与方法,提出了分层递阶的信息库结构和相应的层次检索模型,分析了层次检索算法的时间复杂度。将信息库由传统的单层结构拓展为分层递阶的树形结构,并且对每个节点定义属性值,可以从不同层次揭示信息库的类别特征,实现不同信息颗粒之间的快速转换,而且很容易实现节点之间、节点与查询向量之间的比较和计算操作。层次检索算法改变传统的海量信息检索方法单纯依靠增加处理器数量提高检索速度的方式,利用分层逐步求精的方法,获得与查询相关的检索领域,然后在此领域内进行检索。由于相关领域的规模远小于整个信息空间,因而层次检索方法可以有效地解决海量信息检索中由于规模过大而造成的问题。
     (2)研究信息库层次结构的建立方法,以及文档的多粒度颗粒化算法,从而构造出分层递阶结构的信息库。本文分别利用智能Agent技术和聚类技术,提出了信息库层次结构的构造方法,给出了基于本体的信息库结构的表示与存储方法。然后在本体结构的基础上,给出不同层次上等价关系和等价类的定义,构造出信息的商空间,提出分层递阶结构的信息库的构造算法。由于在构造商空间的过程,文档颗粒化是严格按照等价关系和等价类进行,因此本方法构建的信息库满足商空间理论的“保假原理”,这为层次检索奠定了数据基础。
     (3)针对海量文档的分类问题,从多分类问题和训练速度问题两个方面,研究基于SVM的海量多类别文档的分类方法。首先,在分析传统的多类SVM的基础上,提出基于遗传算法的ECC-SVM,利用遗传算法解决ECC-SVM的码本问题,实现高效的多类SVM。然后,提出一种在原始样本空间下缩减训练样本规模的算法,以解决大规模样本集下SVM的训练问题。本算法中提出了一种新的距离计算方法,称为k近邻距离(k-DNN),利用k-DNN得到相应的类间距离和类内距离,以及噪声识别和样本重要性评价方法,提出训练样本的减样算法。k-DNN取最近k个样本与其距离的平均值,是传统距离的更一般化形式,可以有效克服传统距离的偶然性强、对噪声敏感、对样本分布敏感的局限,使得样本的类间距离和类内距离更加合理。
     (4)研究层次检索模型的个性化问题,以及多层次结构下用户兴趣的动态获取方法。提出了个性化层次检索模型,使得本文的层次检索能够根据不同的用户背景给出不同的检索结果。然后,根据网站结构的层次化特征,提出基于蚁群算法的多层次用户兴趣的动态获取算法。本算法易于实现,能够提供更高层次、更多内容的用户兴趣信息,并且能有效克服传统挖掘方法只能获取长期兴趣、不能捕捉用户兴趣变化的局限,适合复杂多变的网络环境。
With the widespread application of computers and rapid development of internet, the information scale is increasing at an exponential speed. There are two problems to massive information retrieval. Firstly, how to get useful information accurately from large scale information resource without searching it manually from lots of feedback that retrieval system returned. Secondly, how to realize an efficient retrieval method that can retrieval massive information rapidly. Under this background, massive information retrieval has aroused great interest, and become one of the major topics in the field of information retrieval.
     Inspired by human intelligent that can observe and analyze problem at multi-level and multi-granularity, quotient space theory unifies the structure of multi-granularity objects with the mathematical concepts of set and the space, and establish the object model to solve complex problems in practical engineering. Observing and analyzing problem at a bigger granularity, you may find the problem become simpler, and the solving speed become faster, especially in the case of solving large complex problems. In this dissertation, massive information as the object, quotient space theory as the tool, the problems of massive information retrieval based on quotient space theory are studied. The main content and innovation including as following:
     (1) Based on quotient space theory, the hierarchical structure of information resource and the method of hierarchical information retrieval are proposed, and the time complexity of hierarchical retrieval is analyzed. The information resource structure is changed from traditional single-layer structure to a hierarchical tree structure, and each node has been defined a feature value. This structure can reveal class characters of information resource at different levels, and transform information rapidly between different granularities, and make it easy to compare and compute between nodes or between node and the query vector. To increasing retrieval speed, traditional method depends on increasing processor's number, but hierarchical retrieval algorithm tries to get a retrieved field which is much smaller then the whole information space, by using a hierarchical stepwise refinement method. With the method of reducing retrieval field greatly, hierarchical information retrieval can solve the problems of massive information resource effectively.
     (2) To establishing the hierarchical structure information resource, the method of hierarchical structure creating, and the algorithm of multi-granularity document granulation are investigated. Using Agent technology and clustering technology, the dissertation proposed two methods to create hierarchy structure, and gave an ontology-based representation and storage method. Then, based on this hierarchy structure, equivalence relation and equivalence class are defined to establish quotient space of information resource and get the algorithm of document granularity. As the document granulation is strictly based on equivalence relations and equivalence classes in the process of creating quotient space, the hierarchical information database meets the "Guaranteed False Principle" of quotient space theory, which laid the data foundation for hierarchical information retrieval.
     (3) To solve the classification problem about massive document, the classification method for massive multi-class document is investigated from two aspects such as training speed and multi-classification problems. Firstly, based on the analysis of the traditional multi-class SVM, we used genetic algorithm to solve the code problem of the ECC-SVM, and a new efficient ECC-SVM based on genetic algorithms is proposed. Then, to solve the problem of SVM training speed for large-scale sample set, the reduction method for SVM training samples under the original sample space is proposed. A new distance measure called distance of k-nearest neighbors(k-DNN) is presented, and corresponding distance between/within classes is defined according to k-DNN, and the methods of noise identification and samples importance evaluation are proposed, and the algorithm of reduction SVM training samples is proposed. Taking value of the average distance between sample and the k nearest samples, k-DNN is a more general form of traditional distance, it can avoid the limitations of the traditional distance such as contingency, noise-sensitive and distribution-sensitive, and can make the distance between/within classes more reasonable.
     (4) The methods of personalized hierarchical information retrieval and dynamic multi-level user interest extracting are investigated. To get different information according to different users under different backgrounds, the model of personalized hierarchical information retrieval is proposed. Then, according to the hierarchical characteristic of the website, the algorithm of extracting dynamic multi-level user interest based on ant colony optimization is presented. The algorithm can provide easily more information about user interest at higher level, and can effectively overcome the limitations of the traditional mining that only obtains long-term interest and can not capture dynamic user interest, especially in the environment of complex dynamic internet.

引文

[1]Shuming Shi, Ji-Rong Wen, PQing Yu, et al. Gravitation-based model for information retrieval [C], Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. Aug. 2005, Salvador, Brazil, P.488-495.
    [2]Google. Google:We knew web was big, The Official Google Blog, http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
    [3]Zhang B, Zhang L. Theory and Applications of Problem Solving. North-Holland: Elsevier Science Publishers B.V.,1992.
    [4]张铃,张钹,问题求解理论及应用：商空间粒度计算理论及应用(第2版),清华大学出版社
    [5]Ling Zhang, Bo Zhang. Fuzzy reasoning model under quotient space structure, Information Sciences-Informatics and Computer Science, Jun.2005, Vol.173 No.4, P.353-364
    [6]Liquan Zhao,Ling Zhang,Bo Zhang. Granular Analysis of Time Sequence Based on Quotient Space, Proceedings of the International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce, Nov.2006, P.69
    [7]Ling Zhang, Bo Zhang. The Quotient Space Theory of Problem Solving, The 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Conputing, Apr.2004, Vol.59 No.2-3
    [8]王树梅.信息检索相关技术研究[D],南京理工大学,2007
    [9]Pfatiha Sadat, PMasatoshi Yoshikawa, PShunsuke Uemura. Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval [C], Proceedings of the 41st Annual Meeting on Association for Computational Linguistics. Jul.2003, Sapporo, Japan, Pages 141-144
    [10]Ilmerio R. Silva, Joao Nunes Souza, Karina S. Santos. Dependence Among Terms in Vector Space Model [C]. Proceedings of the International Database Engineering and Applications Symposium, Jul.2004, Pages 97-102
    [11]PJaime Teevan, PDavid R. Karger. Empirical development of an exponential probabilistic model for text retrieval:using textual analysis to build a better model[C]. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, Aug.2003, Toronto, Canada, Pages 18-25
    [12]Pping Wang, PKuo-Ming Chao, Chi-Chun Lo, PChun-Lung Huang, Pyinsheng Li. A Fuzzy Model for Selection of QoS-Aware Web Services[C]. Proceedings of the IEEE International Conference on e-Business Engineering, Oct.2006, Pages 585-593
    [13]李蕾,王楠,钟义信等.基于语义网络的概念检索研究与实现,情报学报,2000.19(5)：525-531
    [14]赵卫东,盛昭瀚.基于快速模拟退火的案例检索模型研究,管理工程学报,2001.15(1)：77-79
    [15]张小芳.几种常见信息检索模型的分析与评价,情报杂志,2008,27(3)：121-123
    [16]Ramaprabhu Janakiraman, Lihao Xu. Efficient and flexible parallel retrieval using priority encoded transmission[C]. Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video, Jun.2004, Cork, Ireland, Pages 48-53
    [17]雷雪.分布式检索中信息集选择方法研究综述,情报科学,2008,26(2)316-319
    [18]周水庚,,胡运发,关佶红.基于邻接矩阵的全文索引模型,软件学报,2002,13(10)：1933-1942
    [19]张刚,谭建龙.分布式信息检索中文档集合划分问题的评价,软件学报,2008,19(1)：136-143
    [20]Kees Jan Roodbergen, and Iris F.A. Vis, A survey of literature on automated storage and retrieval systems, European Journal of Operational Research, Volume 194, Issue 2,16 April 2009, Pages 343-362
    [21]李玉鑑,操卫平,周兰珍.结构化向量空间模型及其在Web信息检索中的应用,北京工业大学学报,2008,34(4)：441-444
    [22]Brajnik, G., Mizzaro, S., and Tasso, C. Evaluating user interfaces to information retrieval systems:A case study on user support[C]//SIGIR'96, Zurich, Switzerland, Aug.18-22, ACM Press, NewYork, NY:128-136.
    [23]Cooper, J. W. and Byrd, R. J. Lexical navigation:Visually prompted query expansionand refinement[C]//DL'97, Philadelphia, PA, July 23-26, R.B. Allen and E. Rasmussen, Chairs. ACM Press, New York, NY:237-246
    [24]Claudio Carpineto, and Giovanni Romano. Effective reformulation of Boolean queries with concept lattices[C]//FQAS'98, Roskilde, Denmark. Springer-Verlag, Heidelberg, Germany:83-94.
    [25]Rila Mandala, Takenobu Tokunaga, and Hozumi Tanaka. Combining Multiple Evidence from Differen Types of Thesaurus for Query Expansion[C]//SIGIR'99, Berkeley, California, USA, ACM Press, New York, NY:191-197.
    [26]吕碧波,赵军.基于相关文档池建模的查询扩展[J].中文信息学报,2006,20(03)：78-83.
    [27]贺宏朝,何丕廉,高剑峰,等.一种基于上下文的中文信息检索查询扩展[J].中文信息学报,2002,16(6)：37-37.
    [28]Jinxi Xu andW. Bruce Croft, Query Expansion Using Local and Global Document Analysis[C]//SIGIR'96, Zurich, Switzerland, Aug.18-22, ACM Press, New York. NY:4-11.
    [29]欧阳军林,夏利民.基于二值信息的颜色和形状特征的图像检索,小型微型计算机系统,2007,28(7)：1262-1266
    [30]李金龙,王煦法.基于散布矩阵分析的相关反馈算法及应用.电路与系统学报,2008,13(5)：1-6
    [31]L zhang, F Z Lin, B Zhang. Support Vector Maching for Image Retrieval [C]. International Conference on information Processing.2001,721-724.
    [32]I J Cox, T P Minka, T V Papathomas, P N Yianilos. The Bayesian Imsge Retrieval System,, PicHunter:Theory, Implementation, and Psychophysical Experiments [J]. IEEE Trans. On Image Processing,2000,9(l):20-37.
    [33]Y Ishikawa, R subramanya, C Faloutsos. Mondreader:Query databases through multiple examples[A]. proceedings of the 24th VLDB cConfrence[C]. 1998.651-675.
    [341郭景峰,赵玉艳,边伟峰,等.基于改进的凝聚性和分离性的层次聚类算法.计算机研究与发展,2008,45(z1)：202-206
    [35]Januzaj E, Kriege HP, Pfeifle M. Towards effect and efficient distributed clustering. In:Proc. of the 3th IEEE Inr'l Conf. on Data Mining.2003. http://citeseer.ist.psu.edu/januzaj03towards. html.
    [36]Ester M, Kriegel HP, Sander J, Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In:Simoodis E, Han J, Fayyad UM, eds. Proe. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Oregon:AAAI Press,1996.226-231.
    [37]M Ankerst, M M Breuning, H P Kriegel, et al. OPTICS:Ordering points to indentify the clustering structure. In:Proc of 1999 ACM SIGMOD Int'l Conf on Management of Data. Philadelphia, Pennsylvania:ACM Press,1999:49-60.
    [38]Wang W, Yang J, Muntz R. STRING:a statistical information grid approach to spatial data mining[C]//Proceedings of the 23rd International Conference on Very Large Databases, Athens, Greece,1997:186-195.
    [39]Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining Applications[C]//Proceedings of SIGMOD'98.[S.I.]:ACM Press,1998:94-105.
    [40]Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster:a multi-reso-lution clustering approach for very large spatial databases[C]//Proc.1998 Int Conf Very Large Data Base, NewYork,1998:428-439.
    [41]李杨,曾海泉,刘庆华,等.基于KNN的快速Web文档分类[J].小型微型计算机系统,2004,25(4)：725-729.
    [42]Wang Zi-Qiang, Sun Xia, Zhang De-Xian, et al. An optimal SVM-Based Text
    Classification Algorithm[C]. International Conference on Machine Learning and Cybernetics,2006:1378-1381.
    [43]Lewis D D. Naive(Bayes) at forty:The independence assumption in information retrieval[C]. Chemnitz, Germany:Proceedings of 10th European Conference on Machine Learning,1998.
    [44]尚文倩,黄厚宽,刘玉玲,等.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10)：1688-1694.
    [45]苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展软件学报2006,17(9)：1848-1859
    [46]LI Wen-jie, Natural Language Processing for Chinese Information Retrieval. Transactions of Tianjin University,2000,6(02):135-140
    [471王灿辉,张敏,马少平.自然语言处理在信息检索中的应用综述[J].中文信息学报,2007,21(2)：35-45.
    [48]James Allan, Giridhar Kumaran. Stemming in the Language Modeling Framework[C]. in:Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM Press, 2003:455-456.
    [49]PGuihong Cao, Jian-Yun Nie, PJing Bai. Integrating Word Relationships into Language Models[C]. In:Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press,2005:298-305.
    [50]王树梅.信息检索相关技术研究[D].南京理工大学博士学位论文.2007.
    [51]潘谦红.分布式信息检索的研究与应用[D].中国科学院研究生院(计算技术研究所)博士学位论文.1999.
    [52]PC. Buckley, PG. Salton, PJ.Allan. The SMART information retrieval project[C], Proceedings of the workshop on Human Language Technology, Mar.1993, Princeton, New Jersey,,1993, Pages 392-392
    [53]真溱.美国记忆：特点、技术方案要点及质量标准(下),情报理论与实践,2001,24(5)：393-395
    [541马建玲.基于TRS系统构建网上专题阅览室,情报理论与实践,2005,6121-122
    [55]Bernard J. Jansen, Amanda Spink, Jan Pedersen. A temporal comparison of Alta Vista Web searching:Research Articles[C], Journal of the American Society for Information Science and Technology, Volume56, Issue 6,Apr.2005, Pages 559-570
    [56]Brian Pinkerton, Edward Lazowska, PJohn Zahorjan. Webcrawler:finding what people want [D], Jan.2000, Doctoral Thesis:University of Washington
    [57]Karen Angel. Inside Yahoo:Lessons from the Masters of Brand,Growth and Reinvention, Mar.2002,1st Book:John Wiley & Sons, Inc.
    [58]Greg R. Notess. Searching the World-Wide Web:Lycos, WebCrawler and more [J], Jul.1995 Online, Volumel9, Issue 4, Pages 48-53
    [59]PXiangji Huang, PAijun An, PNick Cercone, et al. Discovery of Interesting Association Rules from Livelink Web Log Data[C], Proceedings of the 2002 IEEE International Conference on Data Mining, Dec.2002,1730 Massachusetts Ave, NW Washington, DC USAPages 763
    [60]Steven Kirsch. Infoseek's experiences searching the internet[C], ACM SIGIR Forum, ACM 2 Penn Plaza, Suite 701 New York NY USA. Volume32, Issue24, Pages 3-7
    [61]北大天网.搜索引擎介绍—北大天网搜索引擎,http://www.sdau.edu.cn/ support/search/puk.htm
    [62]张铃,张钹.模糊商空间理论(模糊粒度计算方法),软件学报,2003,14(4)：770-776
    [63]王国胤,张清华,胡军.粒计算研究综述,智能系统学报,2007,2(6)：8-26
    [64]张向荣,谭山,焦李成.基于商空间粒度计算的SAR图像分类,计算机学报,2007,30(3)：483-490
    [65]Li Feng, Zeng Zhiming, et al. Design of content-based retrieval system in remote sensing image database, Geo-Spatial Information Science, Volume 9, Issue 3,2006, Pages 191-195
    [66]Fang,H.,Tao,T.,and Zhai. C.(2004). A formal study of information retrieval heuristics. In Proceedings of the 2004 ACM SIGIR Conference on Research and Development in Information Retrieval.
    [67]Baeza-Yates, R.& Ribeiro-Neto, B. (Eds.). (1999). Modern Information Retrieval. MA:Addison-Wesley.
    [68]Ijsbrand Jan Aalbersb erg and Frans Sijstermans. High-quality and high performance full-text document retrieval:the parallel infoguide system. In Proceedings of the First International Conference on Parallel and Distributed Information Systems, pages 151-158, Miami Beach, Florida,1991
    [69]David Girouard. Google Groans Under Data Strain, Keynote at interop show in New York, http://www.byteandswitch.com/document.asp?doc_id=85804
    [70]Jian-Min Ma, Wen-Xiu Zhang, et al. Granular computing and dual Galois connection, Information Sciences, Dec.2007, Vol.177 No.23, P.5365-5377
    [71]Yuhua Qian, Jiye Liang,et al. Measures for evaluating the decision performance of a decision table in rough set theory, Information Sciences:an International Journal, Jan.2008, Vol.178 No.1
    [72]Zichen Sun, Yuanzheng Song, et al. hierarchical video retrieval with adaptive multi-modal fusion, Proceedings of the 2008 international conference on Content-based image and video retrieval
    [73]Ronald R. Yager, A Hierarchical Document Retrieval Language, Information Retrieval, Vol.3, No.4 (Dec.2000), P.357-377
    [74]Andreas Wichert, Content-based image retrieval by hierarchical linear subspace method, Journal of Intelligent Information Systems Vol.31 No.1, Aug.2008, P. 85-107
    [75]Korinna Bade, Marcel Hermkes, Andreas Nurnberger. User Oriented Hierarchical Information Organization and Retrieval, Proceedings of the 18th European conference on Machine Learning, Sep.2007, P.518-526
    [76]Philip J. Cowans. Information retrieval using hierarchical dirichlet processes, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, Jul.2004, P.564-565
    [77]P Cees G M. Snoek, Marcel Worring. Concept-Based Video Retrieval, Foundations and Trends in Information Retrieval. Vol.2, No.4, Apr. 2009, P.215-322
    [78]PYi Hu, PRuzhan Lu, Yuquan Chen, PHui Liu. A New Hierarchical Conceptual Graph Formalism Adapted for Chinese Document Retrieval, Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) Vol.2, Aug.2007, P.653-657
    [79]Maes, P. Agents That Reduce Work and Information Overload. Communications of the ACM,1994,37(7):31-40.
    [80]Chavez, A. and Maes, P. Kasbah:An Agent Marketplace for Buying and Selling Goods. In:Proceedings of First International Conference on the Practical Application of Intelligent Agent and Multi-agent System. London, UK.1996.
    [81]刘阳,许松涛,吴志美.一种分级检索MPEG视频的方法,软件学报,2003,14(3)：675-681
    [82]任磊,王威信等.海量层次信息的Focus+Context交互式可视化技术,软件学报,2008,19(11)：3073-3082
    [83]王斌,徐扬.ACM SIGIR 2008参会报告,中文信息学报,2008,22(6)：127-128
    [84]吴晨, 张全等.语义理解下的自然语言处理及信息检索模型,计算机科学,2008,35(5)：113-118
    [85]成全,司辉.信息检索相关性评价及其改善策略研究,情报杂志,2008,27(2)：129-133
    [86]张刚,谭建龙.分布式信息检索中文档集合划分问题的评价,软件学报,2008,19(1)：136-143
    [87]苏新宁.信息检索理论与技术.科学技术文献出版社,2004年9月
    [88]姚全珠,张楠,杨增辉等.基于压缩后缀数组技术的搜索引擎,计算机工程.2008,34(10)：83-88
    [89]张丽娟,李舟军.分类方法的新发展：研究综述,计算机科学,2006,33(10)：11-15
    [90]张阔,徐鹏,李涓子等.基于优化层次聚类的文档逻辑结构抽取,清华大学学报(自然科学版).2005,45(4)：471-474
    [911张刚,刘悦,程学旗.查询空间的分布式文档集合划分算法,中文信息学报.2008,22(1)：56-60
    [921张敏,耿骞.并行信息检索及其控制过程.情报科学.2004,22(8)：985-988
    [931张刚,谭建龙.分布式信息检索中文档集合划分问题的评价,软件学报,2008,19(1)：136-143
    [94]康恒,刘文举.基于综合因素的汉语连续语音库语料自动选取,中文信息学报.2003,17(4)：27-32
    [95]张启宇,朱玲,张雅萍.中文分词算法研究综述,情报探索,2008,11：53-56
    [96]张春霞,郝天永.汉语自动分词的研究现状与困难.系统仿真学报,2005,17(1)：138-147
    [97]Masao Utiyama, Hitoshi Isahara. A Statistical Model for Domain-Independent Text Segmentation[C]. The 39th Annual Meeting of the Association for Computational Linguistics and 10th Conference of the European Chapter of the Association for Computational Linguistics.2001.491-498.
    [98]刘海峰,王元元,姚泽清等.文本分类中一种基于选择的二次特征降维方法,情报学报,2009,28(1)：23-27
    [99]任克强,张国萍,赵光甫.基于相对文档频的平衡信息增益降维方法,江西理工大学学报,2008,29(5)：68-71
    [100]Changki Lee, Gary Geunbae Lee. Information gain and divergence-based feature selection for machine learning-based text categorization[J]. Information Processing and Management:an International Journal.2006.155-165.
    [101]Huan Liu Rudy Setiono. Chi2:Feature Selection and Discretization of Numeric Attributes[C]. Proceedings of the Seventh International Conference on Tools with Artificial Intelligence. Nov.1995.88.
    [102]Pablo A. Estevez, Michel Tesmer, Claudio A. Perez, Jacek M. Zurada. Normalized mutual information feature selection[J]. IEEE Transactions on Neural Networks.2009.189-201.
    [103]PAnirban Dasgupta,PRavi Kumar, PPrabhakar Raghavan, Andrew Tomkins. Variable latent semantic indexing[C]. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining.2005.13-21.
    [104]甄志龙,韩立新,陆佃龙.基于模糊关系的文本分类特征选择方法,情报学报.2008,27(6)：851-856
    [105]陈涛,谢阳群.文本分类中的特征降维方法综述.情报学报,2005,24(6)：690-695
    [1061林永民,吕震宇,赵爽等.向量空间模型中特征加权的研究.情报杂志,2008,27(3)：5-7
    [107]Mike Uschold.1998. Knowledge level modelling:concepts and terminology. The Knowledge Engineering Review, Vol.13:1,1998,5-29
    [108]Guarino, N. and Giaretta, P.1995. Ontologies and Knowledge Bases:Towards a Terminological Clarification. In N. Mars (ed.) Towards Very Large Knowledge Bases:Knowledge Building and Knowledge Sharing 1995. IOS Press, Amsterdam: 25-32
    [109]Gruber T,1995. Towards principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies 43 (5/6): 907-928
    [110]Z. Pawlak, J. Grzymala-Busse, R. Slowinski, and W. Ziarko, Rough Sets[J], Communications of the ACM (S0001-0782),1995,38(11):89～95.
    [111]Marzena Kryszkiewicz. Rough set approach to incomplete information systems[J]. Information Sciences (S0020-0255),1998,112(1-4):39-49.
    [112]Stefanowski J, Tsoukias A. On the extension of rough sets under incomplete information[C]. In:S Zhong, A Skowron, S Ohsuga, eds. Proc. of the 7th Int'l Workshop on New Directions in Rough Sets, Data M ining, and Granular Soft Computing. Berlin:Springer-Verlag,1999,73-81.
    [113]王国胤.Rough集理论在不完备信息系统中的扩充[J],计算机研究与发展,2002,39(／10)：1238-1243.
    [114]黄兵,周献中.不完备信息系统中基于联系度的粗集模型拓展[J].系统工程理论与实践,2004,01：88-92.
    [115]徐怡,李龙澍,李学俊.基于集对势的扩充粗糙集模型[J].系统仿真学报,2008,20(6)：97-103.
    [116]Xibei Yang, PJingyu Yang, PChen Wu, et.al. Dominance-based rough set
    approach and knowledge reductions in incomplete ordered information system [J]. Information Sciences:an International Journal (S0020-0255),2008,178(4),1219-1234.
    [117]K.Thangavel, A. Pethalakshmi. Dimensionality reduction based on rough set theory:A review[J]. Applied Soft Computing (S1568-4946).2009,9(1), 1568-4946.
    [118]Yee Leung, Tung Fung, Ju-Sheng Mi, et. al. A rough set approach to the discovery of classification rules in spatial data[J]. International Journal of Geographical Information Science (S1365-8816).2007,21(9),1033-1058.
    [119]Xibei Yang, Fang Qu, Jingyu Yang, et. al. A Novel Extension of Rough Set Model in Incomplete Information System[C]. Proceedings of the 2008 3rd International Conference on Innovative Computing Information and Control. Jun.2008, 306-312.
    [120]M.K.Sabu, G. Raju. Rough Set Approaches for Mining Incomplete Information Systems [C]. Proceedings of the 4th international conference on Intelligent Computing. Sep.2008,914-921.
    [121]张文明,薛青.粗糙集方法在作战仿真数据挖掘中的应用[J],系统仿真学报,2006,18(2)：179-181.
    [122]于艾清,顾幸生.基于广义粗糙集的不确定条件下的Flow Shop调度[J],系统仿真学报,2006,18(12)：3369-3376.
    [123]赵克勤.集对分析及其初步应用[M].杭州：浙江科学出版社,2000.
    [124]黄兵,周献中.基于集对分析的不完备信息系统粗糙集模型[J],计算机科学,2002,29(9专刊)：1-3.
    [1251王丽娟,吴陈,严熙.基于限制容差关系和集对分析的数据依赖在ⅡS中的应用[J].系统工程理论与实践,2007,11：97-103.
    [126]张海东,舒兰.限制容差关系下的集对变精度粗糙集模型[J].模糊系统与数学,2007,21(5)：125-130.
    [127]杨习贝,杨静宇,於东军,等.不完备信息系统中的可变精度分类粗糙集模型[J].系统工程理论与实践.2008,28(5)：116-121.
    [128]赵克勤.SPA的同异反系统理论在人工智能研究中的应用[J].智能系统学报,2007,2(5)：20-35.
    [129]赵明清,胡美燕,郭世伟.量化容差关系与量化非对称相似关系的比较研究[J].计算机科学.2004,30(10)：98-100.
    [130]张文修.基于粗糙集的不确定决策[M].北京：清华大学出版社,2005.
    [131]Bian, Zhao-qi, Zhang, Xue-gong. Pattern Recognition.2nd ed. Beijing:Tsinghua University Press,1999 (in Chinese)
    [132]VapnikVN. Statistical learning theory[M]. New York:Wiley,1998
    [133]Senqiang Zhou, Ke Wang. Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification[J]. IEEE Transactions on Knowledge and Data Engineering,2005,17(12):1694-1705.
    [134]Yiguang Liua, Zhisheng You, Liping Cao. A novel and quick SVM-based multi-class classifier [J]. Pattern Recognition,2006,39(11):2258-2264.
    [135]HSU C W, LIN C J. A comparison of methods for multiclass support vector machines[J]. IEEE Transactions on Neural Networks,2002,13(2):415-425
    [136]KIJSIRKUL B, USSIVAKUL N. Multiclass support vector machines using adaptive directed acyclic graph[A]. Proceedings of the 2002 International Joint Conference on Neural Networks[C]. Honolulu, HI, USA,2002,1(5):980-985
    [137]PHETKAEW T, KIJSIRIKUL B, RIVEPIBOON W. Reordering adaptive directed acyclic graphs:an improved algorithm for multiclass support vector machines[A]. Proceedings of the 2003 International Joint Conference on Neural Networks[C]. Portland, OR, USA,2003
    [138]TIAN X, DENG F Q. An improved multi-class SVM algorithm and its application to the credit scoring model [A]. Proceedings of the fifth World Congress on Intelligent Control and Automation[C], Hangzhou, China,2004
    [139]TAKAHASHI F. Decision-tree-based multi-class support vector machines[C]. Proceeding of the 9th International Conference on Neural Information Processing. Orchid Country Club, Singapore,2002
    [140]T G Dietterich, G Bakiri. Solving multiclass learning problems via error-correcting output Codes [J]. Journal of Artificial Intelligence Research 1995, 2：263-286
    [141]尹安容,谢湘,匡镜明.Hadamard纠错码结合支持向量机在多分类问题中的应用,电子学报,2008,36(1)：122-126
    [142]Guangxi Chen, Jian Xu, Xiaolin Xiang. Neighborhood Preprocessing SVM for Large-Scale Data Sets Classification[C]//Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Shandong,2008,2:245-249.
    [143]李红莲,王春花,袁保宗,等.针对大规模训练集的支持向量机的学习策略[J].计算机学报,2004,27(5)：140-144.
    [1441段崇雯,成礼智.有效提高SVM参数搜索效率的样本集缩减策略[J].计算机应用,2007,27(2)：363-365.
    [1451罗瑜,易文德,王丹琛,等.大规模数据集下支持向量机训练样本的缩减策略[J].计算机科学,2007,34(10)：211-213.
    [146]Li Yuan-gui, Hu Zhong-hui, Cai Yun-ze, et al. Support vector based prototype selection method for nearest neighbor rules[C]//ICNC,2005, Lecture Notes in Computer Science. Berlin:Springer,2005,3610:528-535.
    [147]刘万里,刘三阳,薛贞霞.基于距离核函数的除噪和减样方法[J].系统工程理论与实践,2008,28(7)：161-164.
    [148]Zhang Ling, Zhang Bo. Relational Between Support Vector Set and Kernel Functions in SVM[J]. Journal of Computer Science & Technology, Sept.2002, 17(5):549-555.
    [149]倪巍伟,陈耿,陆介平,等.基于局部信息熵的加权子空间离群点检测算法[J].计算机研究与发展,2008,45(7)：1189-1194.
    [150]Alexander Pretschner, Susan Gauch. Ontology Based Personalized Search[C]// Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence, Nov.1999, page:391.
    [151]Speretta, M, Gauch, S. Personalized search based on user search histories[C]// The 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Sept. 2005, page:622-628.
    [152]朱前东,庞弘粲.搜索引擎个性化检索研究综述[J].图书馆学刊,2008,30(6)：14-17
    [153]Chen, Sycara. WebMate:A Personal Agent for Browsing and Searching. In:Proceedings of the 2nd International Conference on Autonomous Agents,1998.
    [154]Liu Yumeng. Personalized Web Search by Mapping User Queries to Categories. In:Proceedings of CIKM,2002.
    [155]Seime, Kersehberg. WebSifter:An Ontology-Based Personalizable Search Agent for the Web. In:Proceedings of International Conference on Digital Librarie Research and Practice,2000.
    [156]Tanudjaja Mui. Persona:Acontextualized and Personalized Web Search. In:the 35th Annual Hawai Internationa 1Confefence on System Sciences,2002.
    [157]BU Oztekin, L Ertoz, V Kumar. Usge Aware PageRank. In:World Wide WebConference.2003.
    [158]Matthew Richardson, Pedro Domingos. The Intelligent Surfer:Probabilistic Combination of Link and Content Information in PageRank. In:T G Dietterich, S Becker, Z Ghahramani. Advances in Neural Information Processing Systems 14, Cambridge, MA:MIT Press.2002.
    [1591李超锋,卢炎生.Web使用挖掘技术分析[J].计算机科学,2006,33(2)：220-222.
    [160]Jianping Zeng, Shiyong Zhang, Chengrong Wu. A framework for WWW user activity analysis based on user interest. Knowledge-Based Systems.2008, 21(8):905-910
    [161]王有为,张健斌.一种新的层次结构网站用户兴趣模式变化识别算法[J].系统工程理论与实践.2008,10：89-95
    [162]邢东山,沈钧毅,宋擒豹.从Web日志中挖掘用户浏览偏爱路径[J].计算机学报.2003,26(11)：1518-1523.
    [163]司应硕,杨世平.一种基于改进的AprioriAll算法的Web路径模式挖掘.广西师范大学学报(自然科学版).2007,25(4)：172-175.
    [164]Sarukkai.R. Link prediction and path analysis using Markov chains[C] //Proceedings of the 9th World Wide Web Conference, Amsterdam,2000.
    [165]Andreas Wichert, Content-based image retrieval by hierarchical linear subspace method, Journal of Intelligent Information Systems Vol.31 No.1, Aug.2008, P. 85-107
    [166]Korinna Bade, Marcel Hermkes, Andreas Nurnberger. User Oriented Hierarchical Information Organization and Retrieval, Proceedings of the 18th European conference on Machine Learning, Sep.2007, P.518-526
    [167]Philip J. Cowans. Information retrieval using hierarchical dirichlet processes, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, Jul.2004, P.564-565
    [168]PYi Hu, PRuzhan Lu, Yuquan Chen, PHui Liu. A New Hierarchical Conceptual Graph Formalism Adapted for Chinese Document Retrieval, Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) Vol.2, Aug.2007, P.653-657.
    [169]Enrique Frias-Martinez, Vijay Karamcheti. A customizable behavior model for temporal prediction of web user sequences[J]. WEBKDD,2002:66-85
    [170]凌海峰,刘业政,杨善林.基于蚁群行为的动态挖掘用户导航模式兴趣模型.计算机工程与应用.2008,44(17)：24-26
    [171]冯林,何明瑞,罗芬.一种基于ExLF日志文件的用户会话识别启发式算法.计算机应用.2005,25(2)：314-316.
    [172]刘士新,宋健海,唐加福.蚁群最优化——模型、算法及应用综述.系统工程学报.2004,19(5)：496-502.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700