详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
We are in the era of a knowledge-based economy. The traditional elements such as land, natural resources, capital and labour were replaced by knowledge as major force to promote social progress and development. The management model, theory and technical are required to satisfy the knowledge-based economy. In order to confront the challenge, Chinese word segmentation and text classification are focused and researched in this dissertation. Distributed knowledge management architecture is presented also. Specifically, several achievements are addressed as follows:
     (1)An adaptive Chinese word segmentation algorithm is presented in this dissertation. New words recognition and ambiguity resolving are key problems in Chinese word segmentation. The result of traditional dictionary-based matching algorithm largely depends on the representative of the dictionary so that it can not recognize new words effectively, especially in some professional domains. The algorithm in this dissertation is based on 2-gram statistical model and can meet the requirements of application in accuracy and efficiency respectively. Long sentence and long term are dealed by the idea of‘Divide and Conquer’while partial probability and overall probability are used to identify new words.
     (2)A classification algorithm based on proximal support vector machines (PSVM) is proposed. The main difference between PSVM and standard SVM is the corresponding condition of optimization. Classification is considered with a linear inequality quadratic programming problem by SVM while PSVM takes it as a linear equality quadratic programming problem only. This dissertation describes a new PSVM training algorithm based on descending dimension methods, which has faster training speed and smaller memory requirements advantages. In several data sets of experiments showed that the new classification algorithm has better classfication performance under the condition of time-sensitive through fairly loss of accuracy compare with SVM.
     (3)A new ontology-based hierarchical text classification algorithm is presented. Generally, text classification refers to flat text classication. Hierarchical text classification focuses on the classification under multi-classe. Text knowledge management systems are usually for specific fields, and have a certain ambiguity so that expose the feature of mutil classes. The text relevance and multi-concept-granularity of text are demanded by the users so we need better means to organize hierarchical text. Multi-granularity of the concepts is implemented in hierarchical classification by using the knowledge ontology and controlled keywords. Flat classification can be deal with this algorithm also.
     (4)Distributed knowledge management model based on Super-P2P is present in the dissertation to address the problems of centralized knowledge management. In order to satisfy the development of distribute organizations, effective distribute knowledge management has become the trends of knowledge management.
     Based on the above research and work, suites of Super-P2P based text knowledge management software integrated workflow called eKnow has been developed by the support of Shanghai Pudong SD Funds and Baosight Co. Ltd. Design ideas, system architecture and technical framework are summarized. The software has been used in several cases with substantial economic benefits.
[1] Peter F. Drucker. Harvard Business Review on Knowledge Management [M]. Boston, MA02163, Harvard Business School Press, 1998.
    [2] Wikimedia Foundation, Inc. http://en.wikipedia.org/wiki/Knowledge [Z]. 2008 -8 -29.
    [3] U.Maricopa and A. Satiates. Fudamentals of Knowledge management, in IEEE Tutorial. [C] Boston, MA, Oct, 2001.
    [4] Baeza-Yates, Ribiero-Neto. Modern Information Retrieval [M]. ACM Press, 1999.
    [5] Ed Greengrass. Information Retrieval: A Survey [Z]. http://www.csee.umbc.edu.
    [6] Salton, G., McGill, M.J. Introduction to Modern Information Retrieval [M]. McGraw Hill Publishing Company, New York, 1983.
    [8] Salton, G., Buckley, C. Term-weighting approaches in automatic text retrieval [J].Information Processing & Management, 1988, 24(5):513-523.
    [9] Anick, P.J. Adapting a full-text information retrieval system to the computer troubleshooting domain [C]. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994: 349-358.
    [10] ANSI/NISO Z39.50-1995. Information Retrieval (Z39.50): Application Service Definition and Protocol Specification [S]. ANSI, July 1995.
    [11] G.Salton, A.Wong, C.S. Yang. A vector space model for automatic indexing [J]. Communications of the ACM , 1975 , 18 (11) :613~620.
    [12] Shannon, C. E. and Weaver, W. The Mathematical Theory ot Information [M]. University of Illinois Press, 1949.
    [13] Michae. W Berry, Zlatko D rmac, Elizabeth R Jessup. M atrices. Vector space information retrieval [J]. S IAM Review, 1999, 41(2).
    [14] K. Sparck Jones, S. Walker, S. E. Robertson. A probabilistic model of information retrieval: Development and comparative experiments, part 1 [J]. Information Processing and Management, 2000 , 36 (6) : 779~808.
    [15] K. Sparck Jones, S. Walker, S. E. Robertson1 A probabilistic model of information retrieval: Development and comparative experiments, part 2 [J]. Information Processing and Management 2000 , 36 (6) : 809~840.
    [16] F. Jelinek. Statistical Methods for Speech Recognition [M]. Cambirdge: MIT Press, 1998.
    [17] J. M. Ponte, W. B. Croft. A language modeling approach to information retrieval [C]. The 21st Annual Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, Melbourne, 1998.
    [18] David R. Miller, Tim Leek, Richard M. Schwartz.. A hidden Markov model information retrieval system.[C]. The 22nd Annual Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, 1999.
    [19] D. Hiemstra, W. Kraaij. Twenty one at TREC27: Ad-hoc and cross-language track [C]. The 7th Text Retrieval Conference, Gaithersburg, 1999.
    [20] R. Rosenfeld. Two decades of statistical language modeling: Where do we go from here? [J]. Proc. IEEE, 2000, 88 (8): 1270~1278.
    [21] H. Turtle, W. B. Croft. Evaluation of an inference network based retrieval model [J]. ACM Trans. Information Systems, 1991, 9 (3): 187~222.
    [24]孙茂松,邹嘉彦.汉语自动分词研究评述[J ] .当代语言学, 2001,3 (1) :22–32.
    [25]冯书晓,徐新,杨春梅.国内中文分词技术研究新进展[J].情报杂志2002 ,11:29-30.
    [32] Fuchun Peng, Fangfang Feng, Andrew McCallum. Chinese Segmentation and New Word Detection using Conditional Random Fields[C]. In Proceedings of COLING, 562-568.
    [33] Peng, F. and Schuurmans, D. Self-superised Chinese Word Segmentation [C]. In Proceedings IDA-01, LNCS 2189.2001
    [36] Goh, Chooi-Ling, Masayuki Asahara, and Yuji Matsumoto.Chinese Word Segmentation by Classification of Characters [C]. In Proceedings of Third SIGHAN Workshop.2004.
    [37] Zhuoran Wang, Ting Liu. Chinese Unknown Word Indentification Based on Local Bigram Model [C]. ICCLC’2004.
    [39] Kai Ying L, Jia Heng Z. Research of automatic Chinese word segmentation.[C] Machine Learning and Cybernetics, Beijing,2002, (2):805-809
    [42] Lewis D.D, Gale W.A. A Sequential Algorithm for Training Text Classifiers[C]. SIGIR’94:In: Proceedings of the Severteenth Annual International ACM SIGIR Conference on Reearch and Development in Information Retrieval, 1994:3-12.
    [43] Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Surveys, 2002, 34(1):1-47.
    [44] Debole F, Sebastiani F. Supervised term weighting for automated text categorization [M]. In: Haddad H, George AP, eds. Proc. of the18th ACM Symp. on Applied Computing (SAC-03). Melbourne: ACM Press, 2003. 784:788.
    [45] Xue D, Sun M. Chinese text categorization based on the binary weighting model with non-binary smoothing [M]. In: Sebastiani F, ed.Proc. of the 25th European Conf. on Information Retrieval (ECIR-03). Pisa: Springer-Verlag, 2003. 408-419.
    [46]苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,Vol.17, pp.1848-1859.
    [47] T.M.Cover and P.E.Hart. Nearest Neighbor Pattern Classification [J]. IEEE Trans on Information Theory. 1967, IT-13(1):21-27.
    [48] H.B.Mitchell, P.A.Schaefer. A“soft”K-Nearest Neighbor Voting Scheme [J]. International Journal of Intelligent Systems.2001:456-468.
    [49]李荣陆,胡运发.基于密度的KNN文本分类器训练样本剪裁方法[J].计算机研究与发展, 2004 ,Vol.41,pp.539-545
    [50]乔玉龙,潘正祥,孙圣和.一种改进的快速k近邻分类算法[J].电子学报, 2005 , Vol.33(6): 1146-1149.
    [51] Vapnik V. The Nature of Statistical Learning Theory [M]. New York: Springer, 2000.
    [52] Nello Cristianini, John Shawe-Taylor.李国正等译.支持向量机导论[M].电子工业出版社,2005.
    [53] Dumais S. Using SVMs for Text Categorization [M]. IEEE Intelligent systems.1998
    [54] Joachims T. Text Categorization with Support Vector Machines Learning with many relevant features [C]. Machine Learning: ECML-98. Tenth European Conference on Machine Learning. 1998:137-142.
    [55]庄东,陈英.基于加权近似支持向量机的文本分类[J].清华大学学报(自然科学版),2005,45 (S1):1787-1790
    [56] McCallum A., Nigam K. A comparison of event models for na?ve Bayes text classification [C]. In AAAI-98 Workshop on Learning for Text categorization, 1998.
    [57] Sang-Bum Kim. Some Effective Techniques for Naive Bayes Text Classification [J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(11) :1457–1466.
    [58] Fuhr, N., Hartmanna, S., Lustig, G., Schwantner, M., and Tzeras, K. Air, X. A rule-based multi-stage indexing system for large subject fields [C]. In Processings of RIAO’91, 1991: 606-623.
    [60] K.Hornik. Some New Results on Neural Network Approximation [J]. Neural Networks. 1993(6):1069-1072.
    [61] Y Yang, J P Pedersen. A comparative study on feature selection in text categorization [C]. In: P roc of the 14th Int’l Confon Machine Learning (ICML’97). 1997. 412-420
    [62] Ruiz M. Combining machine learning and hierarchical structures for text categorization [D]. Ames: Graduate College of University of Iowa, 2001.
    [63] Ruiz M, Srinivasan P.Hierarchical text classification using neural networks [J]. Information Retrieval, 2002, 5(1):87-118.
    [64] Sun A, Lim EP, Ng WK. Hierarchical text classification methods and their specification [M]. In: Chan AT, Chan SC, Leong HV, Ng VTY, eds.Cooperative Internet. Computing. Dordrecht: Kluwer Academic Publishers, 2003.236-256.
    [65] Sun A, Lim EP. Hierarchical text classification and evaluation [M]. In:Cercone N, Lin TY,Wu X,eds.Proc.of the 1st IEEE Int’l Conf.on Data Mining(ICDM-01).San Jose:IEEE Computer Society,2001.521-528.
    [66] Sun A, Lim EP, Ng WK. Performance measurement framework for hierarchical text classification [J]. Journal of the American Society for Information Science and Technology, 2003,54(11):1014-1028.
    [67] Zhou S, Fan Y, Hua J, Yu F, Hu Y. Hierachically classifying Chinese Web documents without dictionary support and segmentation procedure[M]. In: Lu H, Zhou A, eds.Proc.of the 1stInt’l Conf.on Web-Age Information Management (WAIM-00). Shanghai: Springer-Verlag, 2000. 215-226.
    [68] Ceci M, Malerba D. Hierarchical classification of HTML documents with WebClassII [M]. In: Sebastiani F, ed.Proc.of the 25th European Conf.on Information Retrieval (ECIR-03). Pisa: Springer-Verlag, 2003.57-72.
    [69] Chien-Chung Huang,Shui-Lung Chuang,Lee-Feng. Live classifier: creating hierarchical text classifiers through web corpora [C]. Proceedings of the 13th international conference on World Wide Web, 2004,184-192.
    [70] A.McCallum,R Rosenfeld,T Mitchell,AY Ng. Improving Text Classification by Shrinkage in a Hierarchy of Classes [C]. Proceedings of the International Conference on Machine Learning, 1998.
    [73] Baum L E. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process [M]. Inequalities, 1972, 3.
    [74] Foo S, Li H. Chinese word segmentation accuracy and its effects on information retrieval [J]. Information Processing and Management, 2004, 40(1):161-190.
    [76] S. F. Chen, J. T. Goodman. An empirical study of smoothing techniques for language modeling [R]. Harvard University , Tech Rep :TR210298 , 1998.
    [77]李志国,张坚等.成都农民热线知识管理需求说明书[R].宝信软件技术报告, 2007.
    [78]蒙川,李志国等.重庆市地方税务局法律法规知识库设计说明书[R].宝信软件技术报告, 2006.
    [82] Rocehio Jr. Relevance feedback in information retrieval [M]. In Salton.G. editor, The SMART Retrieval System: ExPerimentsinAutomatie Doeument Proeessing. Prentiee-hal Inc. , Englewood Clifs,NewJersey. 1971:313一323.
    [84] YYang,J0Pedersen. A comparative study on features election in text categorization [A]. In: Procof the 14th Int’Confon Machine Learning (ICMU97). SanFrancisco: Morgan Kaufmann, 1997.
    [85] DMladenic,MCrobelnik. Featurese eletion for unbalanced class distribution and NaiveBayes [A]. In: Procof the 16th Int’Confon Machine Learning (ICM199). SanFrancisco: Morgan Kaufmann,1999.
    [87] Shrikanth Shankar, George Karyp. A feature weight adjustment algorithm for document categorization [C]. In: P roc of KDD2000. 2000.
    [88] Forman G. An extensive empirical study of feature selection metrics for text classification [J]. Journal of Machine Learning Research, 2003, 3(1):1533-7928.
    [90] Gruber T R. A Translation Approach to Portable Ontology Specifications [J]. Knowledge Acquisition. 1993,5 :199~220
    [91] Borst W N. Construction of Engineering Ontologies for Knowledge Sharing and Reuse [D]. University of wente, Enschede. 1997.
    [92] Studer R, Benjamins V R, Fensel D. Knowledge Engineering, Principles and Methods [J]. Data and Knowledge Engineering. 1998 ,25(122) :161~197
    [93] M R Genesereth, R E Fikes. Knowledge interchange format version 310 reference manual [R]. Stanford University, Tech Rep: Logic-92-1, 1992.
    [94] T. R. Gruber. ONTOLINGUA: A mechanism to support portable ontologies [R]. Stanford University. Tech Rep: KSL-91-66 , 1992.
    [95] V. K. Chaudhri, A Farquhar, R Fikes, et al. OKBC: A programmatic foundation for knowledge base interoperability [A]. In:Proc of the 15th National Conf on Artificial Intelligence (AAAI-98 ) .Madison, Wisconsin : AAAI Press/ MIT Press,1998.
    [96] E Motta1. An overview of the OCML modelling language [C]. The 8th Workshop on Knowledge Engineering: Methods & Languages (KEML98), Karlsruhe, Germany. 1998.
    [97] L Farinas, A Herzig. Interference logic = conditional logic +frame axiom [J]. International Journal of Intelligent Systems. 1994 ,9 (1) : 119~130.
    [98] R MacGregor, R Bates. The loom knowledge representation language[R]. USC Information Sciences Institute. Tech Rep : ISI/ RS87-188 , 1987.
    [99] OWL[ Z ]. http:/ / www.w3c.org/ 2004/ OWL/ .2004.
    [100] F Baader, D Calvanese, D McGuinness, et al. The Description Logic Handbook: Theory, Implementation and Applications [M]. Cambridge: Cambridge University Press. 2003
    [101] P.Bouquet et al. Contextualizing ontologies [C]. Web Semantics: Science, Services and Agents on the World Wide Web,1(2004) 325–343.
    [102] A Gangemi, G Steve, F Giacomelli. ONIONS: An ontological methodology for taxonomic knowledge integration [C]. The ECAI-96 Workshop on Ontological Engineering, Budapest. 1996.
    [103] D. Zhang, W S Lee. Learning to integrate web taxonomies [C]. Web Semantics: Science, Services and Agents on the World Wide Web. 2 (2004) 131–151.
    [104]史忠植,董明楷,蒋运承,张海俊.描述逻辑基础[J].中国科学E辑. 2004.10.
    [105]陆汝钤,石纯一,张松懋等.面向Agent的常识知识库[J].中国科学(E),2000,30 (5): 453~463(Lu Ruqian , Shi Chunyi , Zhang Songmao , et al. Agent-oriented commonsense knowledge base1 Science in China ( Series E) ( in Chinese). 2000 , 30 (5) : 453~463)
    [106] P Karp, M Riley, S Paley, et al. EcoCyc: Electronic encyclopedia of E coligenes and metabolism [J]. Nucleic Acids Research. 1999, 27(1): 55~58
    [107] A Gangemi, G Steve, F Giacomelli. ONIONS: An ontological methodology for taxonomic knowledge integration [C]. The ECAI-96 Workshop on Ontological Engineering, Budapest. 1996.
    [108]宋炜.简明语义网教程[M].高等教育出版社,ISBN: 704015515.
    [109] I.Frommholz. Categorizing web documents in hierarchical catalogues [C].In Proceedings of 23rd European Colloquium on Information Tetrieval Tesearch (ECIR01). Darmstand, DE, 2001.
    [110] K.Wang, S.Zhou and Y.He. Hierarchical classification of real life documents [C]. In Proceedings of the First Siam International Conference on Data Mining.Chicago, 2001.
    [111]刘柏嵩.基于本体的知识管理关键技术研究[J].情报学报, 2005,24(1): 75 -81
    [112]段淳林,曹洲涛.重构企业的知识管理[J ].经济师, 2004, 2: 157-158.
    [113]沈洁,罗建利.基于多Agent系统的分布式知识管理研究[J].系统工程理论与实践, 2006, 1(1): 42-47.
    [114]罗炜,统秉枢,田凌.协同知识管理中利用共享本体建立产品状态模型[J] .计算机辅助设计与图形学学报, 2004, 2: 191-196.
    [115]李飞,高济. OKMF:一个基于本体论的知识管理系统框架[J].计算机辅助设计与图形学学报, 2003, 12: 1538-1543.
    [116] Abdulmajid H M, Lee S P. An ontology-based knowledge model for software experience management[J]. Journal of Knowledge Management Practice, May 2004.
    [117] Fiorano Software, Whitepaper: Super-Peer Architectures for Distributed Computing[Z]. http://www.fiorano.com/whitepapers/superpeer.pdf, 2001.
    [118]黄道颖,黄建华,庄雷,李祖鹏.基于主动网络的分布式P2P网络模型[J].软件学报, 2004, 7(15): 1081-1089.
    [119] Watts D J, Strogatz S H. Collective dynamics of small-world networks[J]. Nature, 1998, 393(6): 440-442.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700