面向领域网页的语义标注若干问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
为网页增加语义元数据信息,将Web页面转化为机器可理解的语义描述形式属于语义标注研究范畴。这一研究不仅对于语义Web远景早日实现至关重要,也对当今Web中各类自动化应用性能的提高具有重要作用。本文作者在深入分析前人工作的基础上,综合运用语义Web、本体构建、自然语言处理、机器学习和Web挖掘等多个领域的知识和方法,开展了“面向领域网页的语义标注”研究工作,主要研究内容包括:
     1.对语义标注研究及相关技术进行了全面的分析和总结。
     2.在综合现有本体构建方法的基础上,提出了一个以研究需求为驱动,支持研究组在分布式环境中开展工作的四阶段本体构建方法。
     3.针对知网2000免费版(简称为HowNet)编程开发接口缺失的现状和项目开发的需求,利用逆向工程技术,给出了一个获取HowNet编程开发接口的技术解决方案,并将获得的接口应用到实验中。
     4.提出了一个在领域本体指导下,综合运用统计学方法和自然语言处理(NLP)技术对中文自然语言Web文档进行语义标注的方法框架。框架分为数据准备阶段、识别阶段和组合阶段。在数据准备阶段利用特征抽取方法构建领域词汇表,并形成类型标注表;在识别阶段提出显式类型标注算法,识别文本中的实例和属性;在组合阶段提出基于依存树的关系抽取算法和基于依存森林的关系抽取算法,完成关系抽取。此外,还给出了一个基于影响度函数的主动学习方法以交互提问方式来提高标注性能。
     5.提出了基于句子频繁特征模式挖掘的语义标注方法框架,包括数据预处理、模式挖掘和规则处理三个阶段。在数据预处理阶段提出特征句提取算法和特征序列生成算法;在模式挖掘阶段提出基于后缀数组的句子频繁特征模式挖掘算法;在规则处理阶段利用挖掘得到的特征模式来编写标注规则,并将规则应用到语义标注过程中。
     本文研究依托国家自然科学基金重大项目“非规范知识的基本理论和核心技术”之开放课题“第二代浏览器原型研究”(60496321),目前研究成果已应用到原型系统CRAB中。
The flourish development of web technology has brought about the explosive growth of web resources, which makes World Wide Web become the largest information repository of the world. Though the web provides people with vast amounts of information, it has increasingly exposed a serious problem:information overload, that is, the information is abundant while the means of acquiring information is relatively scarce, which makes it difficult for people to obtain valid knowledge. Facing this growing trouble, people try to use web information retrieval technology (for example search engines) and automated agents technology based on information extraction to tackle this problem. However, the lack of machine-understandable semantics in the web content makes it difficult for these softwares to be highly efficient. The vision of the semantic web is to make the web content machine-understandable. The achievement of this vision will enable the machine to make full use of the semantic information.in the web pages and meet the user's demands for knowledge effectively. Realizing the vision of the semantic web requires a lot of web contents which contain semantic metadata, but the existing web pages have little of them. To add semantic metadata to web pages belongs to the researches on semantic annotation. These researches on semantic annotation will be advantageous in narrowing the gap between the current web and the semantic web and realizing the vision of semantic web as early as possible, in improving the performance of the search engines and bridging the knowledge gap between the users and the search engines during the search and also in decreasing the developing cost of the automated agents and increasing the robustness and intelligence of the automated agents.
     The thesis is financially supported by the Major Research Program of the National Natural Science Foundation of China under grant No.60496321. Based on the deep analysis of related research and existing methods, this thesis has used many computer science theories and methods comprehensively, such as semantic web, ontology engineering, natural language processing, machine learning and web mining etc., has performed researches on semantic annotation for domain-specific web pages. The results have been used in the prototype system—CRAB.
     The main research results and technical contributions of this thesis are listed as follows: The thesis has introduced and analyzed the current state of art of semantic annotation research and its related techniques. By comparing the situation of the current web and the vision of the semantic web, the thesis has pointed out the urgency and importance of the research on semantic annotation. Based on the analysis and definition of the concept of semantics, annotation and semantic annotation, the thesis has introduced the category and the development of annotation and has reviewed the work related to semantic annotaion. In addition, the study of ontology and ontology engineering closely related to semantic annotation are also introduced in-depth. All the above are the groundworks of the further research works.
     Based on the existing ontology engineering methods, this thesis has presented a four-phase method for constructing the domain ontology, which is driven by research requirements and supports each research group to work in a decentralized environment. The building process is divided into four phases:1. building together. 2. local adaptation,3. analysis and revise.4. release and update. Except the first phase, the last three phases are performed in iterative cycles. After each cycle, a newer version of the domain ontology is released and the prototype of the domain ontology is evloved. This method fits to cope with the scenarios where users' needs change frequently and facilitates the rapid development of ontologies.
     HowNet is an important knowledge base of common sense. However, the lack of the programming interface of HowNet (free edition) makes it hard for the researchers to use it efficiently: Hence, this thesis has given a technique solution to obtain the interface. It is a valuable exploration into the reverse engineering of binary codes. By analysing the assembly codes statically and tracing them dynamically, the thesis has extracted the function interface of Hownet successfully and has generated the header files and libraries according to the function calling conventions. The work has the following two contributions:the first is that it gives the programing interface of the HowNet software and facilitates the research related to Hownet. And the second is that it is a good referential example of making full use of various legacy binary codes in the research and especially of reusing the binary codes without the instruction of the programming interface.
     Noting the similarity between the two forms of knowledge representation:the natural language sentences and the RDF representaions, the thesis has proposed a methodology framework for semantic annotation of Chinese web pages, which is guided by domain ontology and employs the statistical method and the natural language processing (NLP) technology. The framework comprises three phases:the data preparation phase, the identification phase and the grouping phase.
     In the data preparation phase, a focused crawler is employed to build the repository of the domain-specific web pages. The domain lexicon is constructed by the feature selection technique, which is used to obtain the high-frequency words relevant to the domain from the repository. After the types of the words (of the domain lexicon) are labeled which are correspondent to the concepts or properties of the domain ontology manually, the type tagging gazetteer is generated. In the identification phase, the thesis has proposed an explicit property type tagging algorithm (EPTT). The tagging type is divided into two kinds:ontology type and general type. The algorithm uses both the rules and the gazetteers to recognize the instances and properties in the text. Compared with the normal methods of named entity recognization, this method makes the further processing easier by tagging the words of property type explicitly. In the grouping phase, the thesis has grouped the words of the sentences by employing the dependency relationship, has proposed the concepts of dependency tree and dependency forest and has given two algorithms:the relation extraction algorithm based on the dependency tree (DTRE) and the relation extraction algorithm based on the dependency forest (DFRE). The DTRE algorithm uses natural language processing technique (NLP) to parse a given sentence and constructs the dependency tree based on the dependency relationship of the words which have been got firstly, and then the Grammar Relation Triples (grt, for abbrivation) can be generated. By combining the domain ontology and the type tagging results, the algorithm validates the grts. Each valid grts are transfered into a knowledge triple (RDF statement) which is correspondent to the domain ontology. Thus, the mapping from the natural language sentence to RDF representation is done. DFRE algorithm is an improvement of the DTRE, which is designed mainly to tackling the long Chinese sentences. The method decomposes a long sentence into clauses, and then constructs the dependency tree of each clause respectively. After unioning all the dependency trees into a dependency forest, the DTRE algorithm is called to accomplish the relation extraction. The experimental results show that compared with semantic annotation method based on the grammatical relationship of subject-verb-object, both of the two methods are significantly more effective. In addition, an active learning idea based on the influence formula has been presented to increase the performance of the annotation. The influence formula has been defined based on two respects:one is the diffculty of annotating the triple and the other is the influence over the other triples of the collection when this triple is annotated.
     Noting that some sentence patterns occurs frequently in the domain articles, the thesis has presented a method of semantic annotation based on mining the frequent feature patterns of sentences. According to the theory of mining sequential patterns, the thesis has given the definitions of the feature itemset, the feature item and the feature sequence, which are used in mining the frequent feature patterns of sentences. By defining the feature items as word types and defining the feature sequence as type identifier strings, the semantic abstraction of the original sentences can be 吉林大学博士学位论文attained. After giving the above definitions, a methodology framework has been proposed, which is composed of three phases:the data preprocessing phase, the pattern mining phase and the rule processing phase.
     In the data preprocessing phase, the thesis has extracted the words of property type in the type tagging gazetteer to build the feature words list firstly. Based on the defined formula for caculating the feature strengths of the sentences, the feature sentences whose feature strengths are higher than the predefined threshold are extracted from the whole sentence space. After getting the feature sentences, the corresponding feature sequences database can be constructed by employing the feature sequence generation algorithm.
     In the pattern mining phase, the feature sequence database has been processed by the proposed sequential pattern mining algorithm based on suffix array, and the frequent feature patterns have been obtained. This mining algorithm makes full use of the advantage of suffix array in processing the long sequences. The nuclear concept is to transfer caculating the supports of the feature patterns in the feature sequence database into caculating the document frequencies of the feature patterns in the various sequence documents.
     In the rule processing phase, the thesis has written the annotation rules according to the mined feature patterns and has applied them to semantic annotation. The experimental results show this method can tackle some domain specific sentences effectively and avoid the errors caused by the parser. Thus, the precison of the annotation has been improved. By combining this method and the DFRE method, the performance of semantic annotation has been significantly improved.
引文
[1]. Berners-Lee T. Information management:A proposal[EB/OL].1989 [2010-12-10]. http://www.w3.org/History/1989/proposal.html.
    [2]. MICHAEL K B. The deep Web:surfacing hidden value[J]. The Journal of Electronic Publishing,2001,7(1):1080-2711.
    [3]. Mika P. Social networks and the Semantic Web[M]. New York, USA:Springer Science & Business,2007.
    [4]. O'reilly T. What is web 2.0[EB/OL].2005[2010-12-11]. http://oreilly.com/web2/ archive/what-is-web-20.html.
    [5]. Filipe J, Cordeiro J, Pedrosa V. Web Information Systems and Technologies[M]. New York, USA:Springer,2007.
    [6]. Chang K C, He B, Li C, et al. Structured databases on the Web:Observations and implications[J]. ACM SIGMOD Record,2004,33(3):61-70.
    [7]. Lawrence S, Giles C L. Searching the world wide Web[J]. Science,1998,280(5360): 98.
    [8]. Lawrence S, Giles C L. Accessibility of information on the Web[J]. Nature,1999, 400(6740):107-109.
    [9]. Kosala R, Blockeel H. Web mining research:A survey[J]. ACM SIGKDD Explorations Newsletter,2000,2(1):1-15.
    [10]. Kobayashi M, Takeda K. Information retrieval on the Web[J]. ACM Computing Surveys (CSUR),2000,32(2):144-173.
    [11]. Baeza-Yates R, Ribeiro-Neto B. Modern information retrieval[M]. NJ, USA:Pearson Education,1999.
    [12]. Manning C D, Raghavan P, Schutze H. Introduction to information retrieval[M]. London:Cambridge University Press,2008.
    [13]. Croft B, Metzler D, Strohman T. Search engines:Information retrieval in practice[M]. NJ, USA:Addison-Wesley,2009.
    [14]. Moens M. Information extraction:algorithms and prospects in a retrieval context[M]. New York, USA:Springer,2006.
    [15]. Sarawagi S. Information extraction[J]. Foundations and Trends in Databases,2008,1(3): 261-377.
    [16]. Maes P. Agents that reduce work and information overload[J]. Communications of the ACM,1994,37(7):30-40.
    [17]. Mohammadian M. Intelligent agents for data mining and information retrieval[M]. Hershey, PA 17033, USA:Idea Group Inc (IGI),2004.
    [18]. Berners-Lee T. Semantic web road map[EB/OL].1998[2010-12-11]. http://www.w3. org/Designlssues/Semantic.html.
    [19]. Berners-Lee T. Semantic web architecture[EB/OL]. World Wide Web Consortium, 2000[2010-12-22]. http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0. html.
    [20]. Berners-Lee T, Hendler J, Lassila O. The semantic Web[J]. Scientific American,2001, 284(5):28-37.
    [21]. Herman I. Semantic web activity statement[EB/OL].2003[2010-12-24]. http://www.w3. org/2001/sw/Activity.html.
    [22]. Tim S C, Finin T. Joshi A, et al. ITTALKS:A Case Study in the Semantic Web and DAML[C]. Proceedings of the First Semantic Web Working Symposium (SWWS-1), Stanford, CA, USA,2001:477-494.
    [23]. Berners-Lee T, Karger D R, Stein L A, et al. Semantic web development:Technical Proposal [EB/OL].2000[2010-12-10]. http://www.w3.org/2000/01/sw/Development Proposal.
    [24]. Semantic Web Advanced Development for Europe.[EB/OL].2001 [2010-12-11]. http://www.w3.org/2001/sw/Europe/.
    [25]. Tuttle M S, Brown S H, Campbell K E, et al. The Semantic Web As "Perfection Seeking:" A View from Drug Terminology[C]. Proceedings of the First Semantic Web Working Symposium (SWWS-1). Stanford, CA, USA,2001:5-16.
    [26]. Garcia R, Delgado J. Brokerage of intellectual property rights in the semantic web[C]. Proceedings of the First Semantic Web Working Symposium (SWWS-1), Stanford, CA, USA,2001:245-260.
    [27]. Cranefield S. UML and the Semantic Web[C]. Proceedings of the First Semantic Web Working Symposium (SWWS-1), Stanford, CA, USA,2001:113-130.
    [28]. Anutariya C, Wuwongse V, Akama K, et al. Semantic Web modeling and programming with XDD[C]. Proceedings of the First Semantic Web Working Symposium (SWWS-1), Stanford, CA, USA,2001:161-180.
    [29]. Euzenat J. An Infrastructure for Formally Ensuring Interoperability in a Heterogeneous Semantic Web[C]. Proceedings of the First Semantic Web Working Symposium (SWWS-1), Stanford, CA, USA,2001:345-360.
    [30]. Klein M, Abraham B. Searching for Services on the Semantic Web Using Process Ontologies[C]. Proceedings of the First Semantic Web Working Symposium (SWWS-1), Stanford, CA, USA,2001:431-446.
    [31]. Antoniou G, Van Harmelen F. A semantic Web primer.[M].2nd.ed. MIT, USA:The MIT Press,2008.
    [32]. Decker S, Jannink J, Mitra P, et al. An Information Food Chain for Advanced Applications on the WWW[C]. Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries, Springer,2000:490-498.
    [33]. Sure Y, Studer R, Fensel C D, et al. On-To-Knowledge methodology-final version[R]. Institute AIFB, University of Karlsruhe,2002.
    [34]. Benjamins V R, Contreras J, Corcho O, et al. Six challenges for the Semantic Web[C]. Proceedings of the Semantic Web workshop held at KR-2002, Toulouse, France. April, 2002.
    [35]. Uschold M. Where are the Semantics in the Semantic Web?[J]. AI Magazine,2003, 24(3):25.
    [36]. Marshall C C. Toward an ecology of hypertext annotation[C]//Proceedings of the ninth ACM conference on Hypertext and hypermedia:links, objects, time and space—structure in hypermedia systems:links, objects, time and space—structure in hypermedia systems, New York, USA, ACM,1998:40-49.
    [37]. Bush V. As We May Think[J]. Atlantic Monthly,1945,176(1):101-108.
    [38]. Vasudevan V, Palmer M. On web annotations:Promises and pitfalls of current web infrastructure[C]. Proceedings of the 32nd Annual Hawaii International Conference on System Sciences-Volume 2, Maui, HI, USA, IEEE,1999:2012.
    [39]. Heck R M, Luebke S M, Obermark C H. A survey of web annotation systems[EB/OL]. 1999[2010-12-23], http://www.math.grin.edu/-rebelsky/Blazers/Annotations/Summer 1999/Papers/survey_paper.html.
    [40]. Golder S, Huberman B A. The structure of collaborative tagging systems[R]. HP Labs, 2005.
    [41]. Buitelaar P, Hasida K. Semantic Annotation and Intelligent Content[C/OL]. Proceedings of the Workshop Supported by SIGLEX, the ACL Special Interest Group on the Lexicon, Centre Universitaire Luxembourg,2000[2010-12-05]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.7.5340.
    [42]. Handschuh S, Staab S. Annotation of the shallow and the deep web[M]. Annotation for the semantic web. The Netherlands:IOS Press,2003:25-45.
    [43]. Ding Y. Semantic Annotation for The Semantic Web:A Research Area Study[R]. Provo, Utah:Brigham Young University,2005.
    [44]. Hong M, Tang J, Li J. Semantic Annotation Using Horizontal and Vertical Contexts[C]. Proceedings of the First Asian Semantic Web Conference, Beijing, China,2006:58-64.
    [45]. Dingli A, Ciravegna F, Wilks Y. Automatic Semantic Annotation using Unsupervised Information Extraction and Integration[C/OL]. Proceedings of Knowledge Markup and Semantic Annotation Workshop at Second International Conference on Knowledge Capture (KCAP-2003), Sanibel, Florida, USA,2003[2010-12-05]. http://citeseer. ist.psu.edu/viewdoc/summary?doi=10.1.1.58.3540.
    [46]. Kiyavitskaya N, Zeni N, Cordy J R, et al. Semi-Automatic Semantic Annotations for Web Documents[C/OL]. Proceedings of SWAP 2005,2nd Italian Semantic Web Workshop, Trento, Italy, December 2005[2010-12-01]. http://sunsite.informatik. rwth-aachen.de/Publications/CEUR-WS/Vol-166/27.pdf.
    [47]. Popov B, Kiryakov A, Kirilov A, et al. KIM-semantic annotation platform[C]. Proceedings of the Second International Semantic Web Conference. Florida, USA, 2003:834-849.
    [48]. Kiryakov A, Popov B, Terziev I, et al. Semantic annotation, indexing, and retrieval[J]. Web Semantics:Science, Services and Agents on the World Wide Web,2004,2(1): 49-79.
    [49]. Euzenat J. Eight questions about semantic Web annotations[J]. Intelligent Systems, IEEE,2005,17(2):55-62.
    [50]. Reeve L, Han H. Survey of semantic annotation platforms[C]. Proceedings of the 2005 ACM symposium on Applied Computing, Santa Fe, New Mexico, USA,2005: 1634-1638.
    [51]. Hung J C. The semantic annotated documents:from HTML to the semantic web[C]. Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications, Queensland, Australia,2007:413-418.
    [52]. Uren V, Cimiano P, Iria J, et al. Semantic annotation for knowledge management: Requirements and a survey of the state of the art[J]. Web Semantics:Science, Services and Agents on the World Wide Web,2006,4(1):14-28.
    [53]. Handschuh S. Creating Ontology-based Metadata by Annotation for the Semantic Web[D]. Karlsruhe:University of Karlsruhe (TH), Institut AIFB,2005.
    [54]. Welty C, Ide N. Using the right tools:enhancing retrieval from marked-up documents[J]. Computers and the Humanities,1999,33(1):59-84.
    [55]. Sahay R, Akhtar W, Fox R. PPEPR:plug and play electronic patient records[C] Proceedings of the 23rd Annual ACM Symposium on Applied Computing, the Semantic Web and Applications (SWA 2008), Fortaleza, Ceara, Brazil,2008,2998-2304.
    [56]. Bontcheva K, Wilks Y. Automatic report generation from ontologies:the MIAKT approach[C]. Proceedings of the 9th International Conference on Applications of Natural Language to Information Systems, Salford, UK,2004:324-335
    [57]. Friedland N S, Allen P G, Matthews G, et al. Project halo:Towards a digital aristotle[J]. AI Magazine,2004,25(4):29.
    [58]. Dowman M, Tablan V, Cunningham H, et al. Web-assisted annotation, semantic indexing and search of television and radio news[C]. Proceedings of the 14th international conference on World Wide Web, Chiba, Japan,2005:225-234.
    [59]. Rinaldi F, Schneider G, Kaljurand K, et al. Mining relations in the GENIA corpus[C]. Proceedings of the 2nd European Workshop on Data Mining and Text Mining for Bioin-formatics, Pisa, Italy,2004:61-68.
    [60]. Plessers P, Casteleyn S, Yesilada Y, et al. Accessibility:a Web engineering approach[C]. Proceedings of the 14th international conference on World Wide Web, Chiba, Japan,2005:353-362.
    [61]. Maynard D, Yankova M, Aswani N, et al. Automatic creation and monitoring of semantic metadata in a dynamic knowledge portal[M]. Artificial Intelligence: Methodology, Systems, and Applications. Berlin:Springer,2004:65-74.
    [62]. Svab O, Labsky M, Svatek V. RDF-based Retrieval of Information Extracted from Web Product Catalogues[C/OL]. Proceedings of the SIGIR'04 Semantic Web Workshop, Sheffield,2004[2010-12-04]. http://rainbow.vse.cz/swir04fi.pdf.
    [63]. Hunter J, Schroeter R, Koopman B, et al. Using the Semantic Grid to Build Bridges between Museums and Indigenous Communities [C]. Proceedings of the GGF11—Semantic Grid Applications Workshop, Honolulu, June 10,2004:46-61.
    [64]. Azouaou F, Chen W, Desmoulins C. Semantic Annotation Tools for Learning Material [C/OL]. Proceedings of International Workshop on Applications of Semantic Web Technologies for E-Learning (SW-EL), The Netherlands,2004[2010-12-01]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.98.7162.
    [65]. Montero S, Diaz P, Aedo I. A semantic representation for domain-specific patterns[M]. Metainformatics. Berlin:Springer,2005:129-140.
    [66]. Kahan J, Koivunen M R, Prud'Hommeaux E, et al. Annotea:an open RDF infrastructure for shared Web annotations [J]. Computer Networks,2002,39(5): 589-608.
    [67]. Koivunen M R. Annotea and semantic web supported coIlaboration[C].Invited talk at Workshop on User Aspects of the Semantic Web (User-SWeb) at European Semantic Web Conference, Heraklion, Crete, Greece,2005:5-17.
    [68]. Handschuh S, Staab S. Authoring and annotation of web pages in CREAM[C]. Proceedings of the 11th international conference on World Wide Web, Honolulu, Hawaii. USA,2002:462-473.
    [69]. DeRose S, Maler E, Daniel R. XML pointer language (XPointer) version 1.0[EB/OL]. W3C Candidate Recommendations,2001 [2010-12-01]. http://www. w3. org/TR.
    [70]. Quint V, Vatton I. An introduction to Amaya[J]. World Wide Web Journal,1997,2(2): 39-46.
    [71]. Schroeter R, Hunter J, Kosovic D. Vannotea-a collaborative video indexing, annotation and discussion system for broadband networks [C/OL]. Proceedings of Knowledge Markup and Semantic Annotation Workshop at Second International Conference on Knowledge Capture (KCAP-2003), Sanibel, Florida, USA,2003 [2010-12-05]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.71.7336.
    [72]. Handschuh S, Staab S, Studer R. Leveraging metadata creation for the Semantic Web with CREAM[M]. Advances in Artificial Intelligence. Berlin:Springer,2003:19-33.
    [73]. Bloehdorn S, Petridis K, Saathoff C, et al. Semantic annotation of images and videos for multimedia analysis[C]. Proceedings of the 2nd European Semantic Web Conference (ESWC 2005), Heraklion, Crete, Greece,2005:592-607.
    [74]. McDowell L, Etzioni O, Gribble S, et al. Mangrove:Enticing ordinary people onto the semantic web via instant gratification[C]. Proceedings of The 2nd International Semantic Web Conference(ISWC 2003), Sanibel Island, FL, USA,2003:754-770.
    [75]. McDowell L, Etzioni O, Halevy A. Semantic email:theory and applications[J]. Web Semantics:Science, Services and Agents on the World Wide Web,2004,2(2): 153-183.
    [76]. Handschuh S, Staab S, Maedche A. CREAM:creating relational metadata with a component-based, ontology-driven annotation framework [C]. Proceedings of the First International Conference on Knowledge Capture (K-CAP 2001), Victoria, BC, Canada, 2001:76-83.
    [77]. Ciravegna F, Wilks Y. Designing adaptive information extraction for the semantic web in amilcare[M]. Annotation for the semantic web. The Netherlands:IOS Press,2003: 112-127.
    [78]. Handschuh S, Volz R, Staab S. Annotation for the deep web[J]. Intelligent Systems, IEEE,2005,18(5):42-48.
    [79]. Handschuh S, Staab S, Volz R. On deep annotation[C]. Proceedings of the 12th International World Wide Web Conference(WWW 2003), Budapest, Hungary,2003: 431-438.
    [80]. Handschuh S, Staab S, Volz R, et al. Deep Annotation for Information Integration[C]. Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), Acapulco, Mexico,2003:105-110
    [81]. Volz R, Handschuh S, Staab S, et al. Unveiling the hidden bride:deep annotation for mapping and migrating legacy data to the semantic web[J]. Web Semantics:Science, Services and Agents on the World Wide Web,2004,1(2):187-206.
    [82]. Staab S, Maedche A, Handschuh S. An annotation framework for the semantic web[C/OL]. Proceedings of the First Workshop on Multimedia Annotation, Tokyo, Japan.2001 [2010-12-01]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25. 910.
    [83]. Heflin J, Hendler J. A portrait of the Semantic Web in action[J]. Intelligent Systems, IEEE,2005,16(2):54-59.
    [84]. Golbeck J, Grove M, Parsia B, et al. New tools for the semantic web[C]. Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management(EKAW 2002), Siguenza, Spain,2002:392-400.
    [85]. Kalyanpur A, Hendler J, Parsia B, et al. SMORE-semantic markup, ontology, and RDF editor[R]. Maryland University,2006.
    [86]. Collier N, Kawazoe A, Kitamoto A A, et al. Integrating deep and shallow semantic structures in open ontology forge[C/OL]. Proceedings of the Special Interest Group on Semantic Web and Ontology, JSAI (Japanese Society for Artificial Intelligence), vol. SIG-SWO-A402-05,2004[2010-12-01]. http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.1.5458.
    [87]. Goble S B. Towards Annotation using DAML+OIL[C/OL]. Proceedings of Knowledge Markup and Semantic Annotation Workshop at First International Conference on Knowledge Capture (KCAP-2001), Victoria, BC, Canada,2001 [2010-12-05]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.7625.
    [88]. Carr L A, DeRoure D C, Hall W, et al. The distributed link service:A tool for publishers, authors and readers[C/OL]. Fourth International World Wide Web Conference, Boston, Massachusetts, USA,1995[2010-12-01]. http://www.w3. org/Conferences/WWW4/Papers/178/.
    [89]. Bechhofer S, Goble C, Carr L, et al. COHSE:conceptual open hypermedia service[M]. Annotation for the semantic web. The Netherlands:IOS Press,2003:193-211.
    [90]. Davies N J, Davies J, Studer R, et al. Semantic Web technologies:trends and research in ontology-based systems[M]. West Sussex, PO198SQ, England:John Wiley & Sons, 2006.
    [91]. Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with lixto[C]. Proceedings of 27th International Conference on Very Large Data Bases(VLDB 2001), Roma, Italy,2001:119-128.
    [92]. VARGAS-VERA MOTTA M E, DOMINGUE J, LANZONI M, STUTT A, CIRAVEGNA F. MnM:Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup[C]. Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management(EKAW 2002), Siguenza, Spain,2002: 379-391.
    [93]. Ciravegna F, Dingli A, Petrelli D, et al. User-system cooperation in document annotation based on information extraction[C]. Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management(EKAW 2002), Siguenza, Spain,2002:122-137.
    [94]. Gilardoni L, Biasuzzi C, Ferraro M, et al. Machine Learning for the Semantic Web: Putting the user into the cycle[C/OL]. Dagstuhl Seminar Proceedings 05071, Schloss Dagstuhl, Germany 2005[2010-12-01]. http://www.quinary.com/wp-content/uploads/ 2008/03/quinarydagstuhl.pdf.
    [95]. Black W J, McNaught J, Vasilakopoulos A, et al. CAFETIERE conceptual annotations for facts, events, terms, individual entities, and relations[R]. UMIST, Manchester, UK, 2003.
    [96]. Vasilakopoulos A, Bersani M, Black W J. A suite of tools for marking up textual data for temporal text mining scenarios[C]. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal,2004:24-30.
    [97]. Mikroyannidis A, Theodoulidis B, Persidis A. PARMENIDES:towards business intelligence discovery from Web data[C]. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006), Hong Kong, China,2006: 1057-1060.
    [98]. Siliopoulou M, Rinaldi F, Black W J, et al. Coupling information extraction and data mining for ontology learning in PARMENIDES[C]. Proceedings of the Recherche d'Information Assistee par Ordinateur (RIAO'2004), Avignon, France,2004:156-169.
    [99]. Ciravegna F, Chapman S, Dingli A, et al. Learning to harvest information for the semantic web[C]. The Semantic Web:Research and Applications, First European Semantic Web Symposium(ESWS 2004), Heraklion, Crete, Greece,2004:312-326.
    [100].Etzioni O, Cafarella M, Downey D, et al. Unsupervised named-entity extraction from the web:An experimental study [J]. Artificial Intelligence,2005,165(1):91-134.
    [101].Buitelaar P, Ramaka S. Unsupervised ontology-based semantic tagging for knowledge markup[C]. Proceedings of the Workshop on Learning in Web Search at the International Conference on Machine Learning, Banff, Alberta, Canada, 2005 [2010-12-01].http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.71.2586
    [102].Cimiano P, Handschuh S, Staab S. Towards the self-annotating web[C]. Proceedings of the 13th international conference on World Wide Web(WWW 2004), New York, NY, USA,2004:462-471.
    [103].Cimiano P, Ladwig G, Staab S. Gimme'the context:context-driven automatic semantic annotation with C-PANKOW[C]. Proceedings of the 14th international conference on World Wide Web(WWW 2005), Chiba, Japan,2005:332-341.
    [104].Kogut P. Holmes W. AeroDAML:Applying information extraction to generate DAML annotations from web pages[C/OL]. Proceedings of Knowledge Markup and Semantic Annotation Workshop at First International Conference on Knowledge Capture (KCAP-2001), Victoria, BC, Canada,2001 [2010-12-05]. http://citeseerx.ist.psu.edu/ viewdoc/summary?doi= 10.1.1.21.8180.
    [105].Dill S, Eiron N, Gibson D, et al. A case for automated large-scale semantic annotation[J]. Web Semantics:Science, Services and Agents on the World Wide Web, 2003,1(1):115-132.
    [106].Dill S. Eiron N, Gibson D, et al. SemTag and seeker:bootstrapping the semantic web via automated semantic annotation[C]. Proceedings of the 12th International World Wide Web Conference(WWW 2003). Budapest, Hungary,2003:178-186.
    [107].Svatek V, Labsky M, Vacura M. Knowledge modelling for deductive web mining[C]. Engineering Knowledge in the Age of the Semantic Web,14th International Conference (EKAW 2004), Whittlebury Hall, UK,2004:337-353.
    [108].Cunningham D H, Maynard D D, Bontcheva D K, et al. GATE:A framework and graphical development environment for robust NLP tools and applications[C]. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, Philadelphia, PA,2002:168-175.
    [109].Maynard D, Yankova M, Kourakis A, et al. Ontology-based information extraction for market monitoring and technology watch[C/OL]. ESWC Workshop End User Apects of the Semantic Web, Heraklion, Crete,2005. http://citeseerx.ist.psu.edu/viewdoc/ summary?doi= 10.1.1.106.3315.
    [110].LI Jianming, ZHANG Lei, YU Yong. Learning to Generate Semantic Annotation for Domain Specific Sentences[C]. Proceedings of the First International Conference on Knowledge Capture Workshop on Knowledge Markup and Semantic Annotation, Victoria, B.C., Canada,2001 [2010-12-01]. http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.21.8769.
    [111].Miller G A, Beckwith R, Fellbaum C, et al. Introduction to WordNet:An On-line Lexical Database* [J]. International Journal of Lexicography,1990,3(4):235-244.
    [112].Stojanovic N, Stojanovic L, Volz R. A reverse engineering approach for migrating data-intensive web sites to the Semantic Web[C]. Intelligent Information Processing, IFIP 17th World Computer Congress—TC12 Stream on Intelligent Information Processing, Montreal, Quebec, Canada,2002:141-154
    [113].Mukherjee S, Yang G, Ramakrishnan I V. Automatic annotation of content-rich html documents:Structural and semantic analysis[C]. Proceedings of The 2nd International Semantic Web Conference(ISWC 2003), Sanibel Island, FL, USA,2003:533-549.
    [114].Khelif K, Dieng-kuntz R. Ontology-based semantic annotations for biochip domain[C]. Engineering Knowledge in the Age of the Semantic Web,14th International Conference (EKAW 2004), Whittlebury Hall, UK,2004:483-484
    [115].Bourigault D, Fabre C. Approche linguistique pour l'analyse syntaxique de corpus[J]. Cahiers de grammaire,2000,25:131-151.
    [116].Cunningham H, Maynard D, Tablan V. JAPE:A java annotation patterns engine[R]. Research Memorandum CS-00-10, Department of Computer Science, University of Sheffield,2000.
    [117].Kiyavitskaya N, Zeni N, Mich L, et al. Text mining through semi automatic semantic annotation[C]. Practical Aspects of Knowledge Management 6th International Conference (PAKM 2006), Vienna, Austria,2006:143-154.
    [118].Zeni N, Kiyavitskaya N, Mich L, et al. J.R.:A lightweight approach to semantic annotation of research papers[C]. Natural Language Processing and Information Systems,12th International Conference on Applications of Natural Language to Information Systems (NLDB 2007), Paris, France,2007:61-72.
    [119].Michelson M, Knoblock C A. Semantic annotation of unstructured and ungrammatical text[C]. Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland,2005:1091-1098.
    [120]. Davis B, Handschuh S, Cunningham H, et al. Further use of controlled natural language for semantic annotation of wikis[C/OL]. Proceedings of the 1st Semantic Authoring and Annotation Workshop at ISWC2006, Athens, Georgia, USA.2006[2010-12-01]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.147.4541.
    [121].Tenier S, Toussaint Y, Napoli A, et al. Instantiation of Relations for Semantic Annotation[C]. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, Hong Kong, China,2006:463-472.
    [122].Han L, Chen G, Xie L. et al. AASA:a Method of Automatically Acquiring Semantic Annotations[J]. Journal of Information Science,2007,33(4):435-450.
    [123].Laclavik M, Seleng M, Hluchy L. Towards Large Scale Semantic Annotation Built on MapReduce Architecture*[C]. Proceedings of Computational Science-ICCS 2008,8th International Conference, Part III, Krakow, Poland,2008:331-338.
    [124].Laclavik M, Seleng M , Ciglan M. ONTEA:Platform For Pattern Based Automated Semantic Annotation[J]. Computing and informatics,2009,28(4):555-579.
    [125].Fernandez-garcia N, Blazquez-del-toro J M, Sanchez-fernandez L, et al. Exploiting User Queries and Web Communities in Semantic Annotation[C/OL]. Proceedings of the 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2005) located at the 4rd International Semantic Web Conference ISWC 2005, Galway, Ireland,2005[2010-12-10]. http://ftp.informatik.rwth-aachen.de/ Publications/CEUR-WS/Vol-185/semAnnot05-02.pdf.
    [126].Alani H, Kim S, Millard D E, et al. Automatic Ontology-based Knowledge Extraction from Web Documents [J]. IEEE Intelligent Systems,2003,18(1):14-21.
    [127].Lai Y S, Wang R J. Towards Automatic Knowledge Acquisition from Text Based on Ontology-centric Knowledge Representation and Acquisition[C/OL]. Proceedings of Knowledge Markup and Semantic Annotation Workshop at Second International Conference on Knowledge Capture (KCAP-2003), Sanibel, Florida, USA, 2003[2010-12-05].http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.9876& rep=rep1&type=pdf.
    [128].Schutz A, Buitelaar P. RelExt:A Tool for Relation Extraction from Text in Ontology Extension[C]. Proceedings of the 4th International Semantic Web Conference (ISWC 2005). Galway, Ireland.2005:593-606.
    [129].Carr L, Miles-Board T, Woukeu A. et al. The case for explicit knowledge in documents[C]. Proceedings of the 2004 ACM symposium on Document engineering, Milwaukee, Wisconsin,2004:90-98.
    [130].Lanfranchi V, Ciravegna F, Petrelli D. Semantic Web-based document:editing and browsing in AktiveDoc[J]. The Semantic Web:Research and Applications, Second European Semantic Web Conference (ESWC 2005), Heraklion, Crete, Greece,2005: 623-632.
    [131].Tallis M. Semantic word processing for content authors [C/OL]. Proceedings of Knowledge Markup and Semantic Annotation Workshop at Second International Conference on Knowledge Capture (KCAP-2003), Sanibel, Florida, USA, 2003[2010-12-05].http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.72.762.
    [132].Groza T, Handschuh S, Moller K, et al. SALT-Semantically Annotated LaTeX for Scientific Publications[C]. The Semantic Web:Research and Applications,4th European Semantic Web Conference (ESWC 2007), Innsbruck, Austria,2007: 518-532.
    [133].Groza T, M\oller K, Handschuh S, et al. SALT:Weaving the claim web[C]. The Semantic Web,6th International Semantic Web Conference,2nd Asian Semantic Web Conference (ISWC 2007+ASWC 2007), Busan, Korea,2007:197-210.
    [134].Domingue J, Dzbor M. Magpie:supporting browsing and navigation on the semantic web[C]. Proceedings of the 9th international conference on Intelligent user interfaces, Funchal, Portugal,2004:191-197.
    [135].Dzbor M, Motta E, Domingue J. Opening up magpie via semantic services[C]. The Semantic Web-ISWC 2004:Third International Semantic Web Conference, Hiroshima, Japan,2004:635-649.
    [136].Hogue A, Karger D. Thresher:automating the unwrapping of semantic content from the World Wide Web[C]. Proceedings of the 14th international conference on World Wide Web (WWW 2005), Chiba, Japan,2005:86-95.
    [137].Huynh D, Karger D, Quan D. Haystack:A platform for creating, organizing and visualizing information using RDF[C]. Proceedings of the 8th international conference on Intelligent User Interfaces, Miami, USA,2002:323
    [138].车海燕.面向中文自然语言Web文档的自动知识抽取和知识融合[D].长春:吉林 大学,2008.
    [139]. Che H, Jing T, Sun J, et al. A prototype of semantic-based intelligent search engine for Chinese documents[C]. Proceedings of Fourth International Conference on Fuzzy System and Knowledge Discovery (FSKD 2007), Haikou, Hainan, China,2007: 663-667.
    [140].荆涛,左万利,孙吉贵,等.中文网页语义标注:由句子到RDF表示[J].计算机研究与发展,2008,45(7):1221-1231.
    [141].Neches R, Fikes R E, Finin T, et al. Enabling technology for knowledge sharing[J]. AI magazine,1991,12(3):36-56.
    [142].Gruber T R, others. A translation approach to portable ontology specifications [J]. Knowledge acquisition,1993,5(2):199-220.
    [143].Borst W N. Construction of engineering ontologies for knowledge sharing and reuse[D]. Enschede:Universiteit Twente,1997.
    [144].Studer R, Benjamins V R, Fensel D. Knowledge engineering:principles and methods[J]. Data & knowledge engineering,1998,25(1-2):161-197.
    [145].Guarino N, Giaretta P. Ontologies and knowledge bases:Towards a terminological clarification[J]. Towards very large knowledge bases:knowledge building and knowledge sharing,1995,1(9):25-32.
    [146].Schreiber G, Wielinga B, Jansweijer W. The KACTUS view on the'O'word[C]. Proceedings of I.IJCAI workshop on basic ontological issues in knowledge sharing, Montreal, Quebec, Canada,1995:159-168.
    [147].Bernaras A, Laresgoiti I, Corera J. Building and Reusing Ontologies for Electrical Network Applications[C]. Proceedings of 12th European Conference on Artificial Intelligence (ECAI 1996), Budapest, Hungary,1996:298-302.
    [148].Swartout B, Patil R, Knight K, et al. Toward distributed use of large-scale ontologies[C/OL]. Proceedings of the 10th Workshop on Knowledge Acquisition for Knowledge-Based Systems, Banff. Alberta, Canada.1996[2010-12-05]. http://ksi.cpsc.ucalgary.ca/KA W/KAW96/swartout/Banff_96_final_2.html.
    [149].PEREZ A G, CORCHO O. Ontology Languages for the Semantic Web[J]. IEEE Intelligent Systems,2002,17(1):54-60.
    [150].Kent R. Conceptual Knowledge Markup Language:The Central Core [C]. Proceedings of the 12th Workshop on Knowledge Acquisition for Knowledge-Based Systems, Banff, Alberta, Canada,1999[2010-12-05]. http://citeseerx.ist.psu.edu/ viewdoc/download?doi=10.1.1.93.2191&rep=rep1&type=pdf.
    [151].Karp R, Chaudhri V, Thomere J. XOL:An XML-Based Ontology Exchange Language[R]. Al Center, SRI International,1999.
    [152].Lassila O, Swick R R. Resource Description Framework (RDF) Model and Syntax Specification [EB/OL]. W3C-World Wide Web Consortium,1999[2010-12-01]. http://www. w3. org/TR/REC-rdf-syntax.
    [153].Brickley D, Guha R V. RDF Vocabulary Description Language 1.0:RDF Schema[EB/OL]. W3C Recommendation,2004[2010-12-10]. http://www.w3. org/TR/rdf-schema/.
    [154].Horrocks I, Fensel D, Harmelen F, et al. OIL in a Nutshell[C]. Proceedings of the ECAI'00 Workshop on Application of Ontologies and PSMs, Berlin, Germany, 2000:1-16.
    [155]. van Harmelen F, Patel-Schneider P F, Horrocks I. Reference description of the DAML+ OIL (March 2001) ontology markup language[EB/OL]. DAML+OIL Document, 2001[2010-12-10]. http://www. daml. org/2000/12/reference. html.
    [156].Dean M, Schreiber G, Bechhofer S, et al. OWL Web Ontology Language Reference[EB/OL]. W3C Recommendation,2004[2010-12-12]. http://www.w3. org/TR/owl-ref/.
    [157].Manola F, Miller E, McBride B. RDF Primer[EB/OL]. W3C Recommendation, 2004[2010-12-20]. http://www.w3.org/TR/rdf-primer/.
    [158].Benjamin P C, Menzel C P, Mayer R J, et al. Idef5 method report[R]. Knowledge Based Systems, Inc,1994.
    [159]. Schreiber G, Wielinga B, de Hoog R, et al. CommonKADS:A comprehensive methodology for KBS development[J]. IEEE expert,2002,9(6):28-37.
    [160].Jarrar M, Meersman R. Formal Ontology Engineering in the DOGMA Approach, On the Move to Meaningful Internet Systems[C]. Proceedings of Ontologies, DataBases, and Applications of Semantics for Large Scale Information Systems (ODBASE 2002), Irvine, California,2002:1238-1254
    [161].Spyns P, Meersman R, Jarrar M. Data modelling versus ontology engineering[J]. ACM SIGMOD Record,2002,31(4):12-17.
    [162].Lenat D B, Prakash M, Shepherd M. CYC:Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks [J]. AI magazine,1985, 6(4):65-85.
    [163].Uschold M, King M. Towards a methodology for building ontologies[C]. Proceedings of Workshop on Basic Ontological Issues in Knowledge Sharing, held in conduction with IJCAI-95, Montreal, Quebec, Canada,1995:275-280.
    [164].Uschold M, King M, Moralee S, et al. The enterprise ontology [J]. The knowledge engineering review,1998,13(1):31-89.
    [165].Fernandez M, Gomez-Perez A, Juristo N. Methontology:from ontological art towards ontological engineering[C]. Proceedings of the AAAI97 Spring Symposium Series on Ontological Engineering, Stanford University, California, USA,1997:33-40.
    [166].Arpirez J C, Corcho O, Fernandez-Lopez M, et al. WebODE:a scalable workbench for ontological engineering [C]. Proceedings of the First International Conference on Knowledge Capture (K-CAP 2001), Victoria, BC, Canada,2001:6-13.
    [167].Swartout B, Patil R, Knight K, et al. Ontosaurus:a tool for browsing and editing ontologies[C/OL]. Proceedings of the 10th Banff Knowledge Aquisition for Knowledge-based systems Workshop, Banff, Canada,1996 [2010-12-01]. http://ksi.cpsc.ucalgary.ca/KAW/KAW96/swartout/ontosaurus_demo.html.
    [168].Fox M. The tove project towards a common-sense model of the enterprise[C]. Proceedings of Industrial and Engineering Applications of Artificial Intelligence and Expert Systems,5th International Conference (IEA/AIE-1992), Paderborn, Germany, 1992:25-34.
    [169].Holsapple C W, Joshi K D. A collaborative approach to ontology design[J]. Communications of the ACM,2002,45(2):42-47.
    [170].Kotis K, Vouros G. Human centered ontology management with HCONE[C/OL]. Proceedings of Ontologies and Distributed Systems Workshop, IJCAI-03 Conference, Acapulco, Mexico.2003[2010-12-01]. http://www.icsd. aegean.gr/kotis/publications/ IJCAI2003.PDF.
    [171].Kotis K, Vouros G A, Alonso J P. HCOME:tool-supported methodology for collaboratively devising living ontologies[C]. Proceedings of Semantic Web and Databases, Second International Workshop (SWDB 2004), Toronto, Canada, 2004:155-166.
    [172]. Sure Y, Erdmann M, Angele J, et al. OntoEdit:Collaborative ontology development for the semantic web[J]. The Semantic Web-ISWC 2002, First International Semantic Web Conference, Sardinia, Italy,2002:221-235.
    [173].De Nicola A, Missikoff M, Navigli R. A proposal for a Unified Process for ONtology building:UPON[C]. Database and Expert Systems Applications,16th International Conference (DEXA 2005), Copenhagen, Denmark,2005:655-664.
    [174].Kietz J U, Maedche A, Volz R. A method for semi-automatic ontology acquisiti on from a corporate intranet[C/OL]. Proceedings of Workshop Ontologies and Te xt, co-located with the 12th International Workshop on Knowledge Engineering a nd Knowledge Management (EKAW'2000), Juan-Les-Pins, France 2000[2010-12-0 1]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.7576.
    [175].Noy N F, McGuinness D L, others. Ontology development 101:A guide to creating your first ontology [R]. Stanford Medical Informatics,2001.
    [176].Pinto H S, Staab S, Tempich C. DILIGENT:Towards a fine-grained methodology for Distributed, Loosely-controlled and evolving Engineering of oNTologies[C]. Proceedings of the 16th Eureopean Conference on Artificial Intelligence (ECAI'2004). including Prestigious Applicants of Intelligent Systems (PAIS 2004), Valencia, Spain, 2004:393-397.
    [177].Vrandecic D, Pinto S, Tempich C, et al. The DILIGENT knowledge processes[J]. Journal of Knowledge Management,2005,9(5):85-96.
    [178].Laender A H, Ribeiro-Neto B A, da Silva A S, et al. A brief survey of web data extraction tools[J]. ACM Sigmod Record,2002,31(2):84-93.
    [179].Chang C H, Kayed M, Girgis M R, et al. A survey of web information extraction systems [J]. IEEE Transactions on Knowledge and Data Engineering,2006,18(10): 1411-1428.
    [180].董振东,董强.知网[EB/OL]. Hownet文献介绍,1999[2010-12-01]http://www.keenage.com.
    [181].董振东,董强,郝长伶.知网的理论发现[J].中文信息学报.2007,21(004):3-9.
    [182].段钢.加密与解密[M].第3版.北京:电子工业出版社,2008.
    [183].Petzold. Windows程序设计[M].第5版.北京博彦科技发展有限公司,译北京:北京大学出版社,1999.
    [184].罗宾斯.应用程序调试技术[M].潘文林,陈武,译.北京:清华大学出版社,2001.
    [185].看雪学院.软件加密技术内幕[M].北京:电子工业出版社,2004.
    [186].Lai Y, Wang R, Hsu W. A DAML+OIL-compliant Chinese lexical ontology[C]. Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan,2002:1238-1242.
    [187].Menczer F, Pant G, Srinivasan P. Topical web crawlers:Evaluating adaptive algorithms[J]. ACM Transactions on Internet Technology (TOIT),2004,4(4):378-419.
    [188].荆涛,左万利.基于可视布局信息的网页噪音去除算法[J].华南理工大学学报(自然科学版),2004(S1):84-87+98.
    [189]. Gupta S, Kaiser G, Neistadt D, et al. DOM-based content extraction of HTML documents[C]. Proceedings of the Twelfth International World Wide Web Conference (WWW 2003), Budapest Hungary,2003:207-214.
    [190].黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(003):8-19.
    [191].Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization[C]. Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA,1997:412-420.
    [192].王睿,张洁,张由仪,等.基于混合模型的中文命名实体抽取系统[J].清华大学学报(自然科学版),45(S1):1908-1914.
    [193].张华平,刘群.基于角色标注的中国人名自动识别研究[J].计算机学报,2004,27(1):85-91.
    [194].王宁,葛瑞芳,苑春法,等.中文金融新闻中公司名的识别[J].中文信息学报,2002,16(2):1-6.
    [195]. Yu Hongkui, Zhang Huaping, Liu Qun, et al. Chinese named entity identification using cascaded hidden Markov model [J]. Journal on Communications,2006,27(2):87-94.
    [196].周俊生,戴新宇,尹存燕,等.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809.
    [197].Gao Jianfeng, Li Mu, Huang Changning, et al. Chinese word segmentation and named entity recognition:A pragmatic approach [J]. Computational Linguistics,2005, 31(4):531-574.
    [198].崔世起,刘群,孟遥,等.基于大规模语料库的新词检测[J].计算机研究与发展,2006,43(5):927-932.
    [199].Lucien T. Elements de syntaxe structurale[M]. Paris:Klincksieck,1959.
    [200].Eisner J M. Three new probabilistic models for dependency parsing:An exploration[C]. Proceedings of the 16th conference on Computational linguistics-Volume 1 (COLING '96), Copenhagen, Denmark,1996:340-345.
    [201].McDonald R, Crammer K, Pereira F. Online large-margin training of dependency parsers[C]. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005), University of Michigan, USA,2005:91-98.
    [202].Yamada H, Matsumoto Y. Statistical dependency analysis with support vector machines [C]. Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 2003), Nancy, France,2003:195-206.
    [203].宗成庆.统计自然语言处理[M].北京:清华大学出版社,2008.
    [204].Levy R, Manning C. Is it harder to parse Chinese, or the Chinese Treebank?[C]. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo, Japan 2003:439-446.
    [205]. De Marneffe M C, MacCartney B, Manning C D. Generating typed dependency parses from phrase structure parses[C]. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy,2006:449-454.
    [206].de Marneffe M C, Manning C D. The Stanford typed dependencies representation[C]. Coling 2008:Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation, Manchester, UK,2008:1-8.
    [207].Thompson C A, Califf M E, Mooney R J. Active learning for natural language parsing and information extraction[C]. Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia,1999:406-414.
    [208]. Settles B. Active Learning Literature Survey[R]. Computer Sciences Technical Report, 1648, University of Wisconsin-Madison,2009:1-46.
    [209].Tong S. Active learning:theory and applications[D]. Palo Alto, California:Stanford University,2001.
    [210]. Angluin D. Queries and concept learning[J]. Machine learning,1988,2(4):319-342.
    [211].Cohn D, Atlas L, Ladner R. Improving generalization with active learning[J]. Machine Learning,1994,15(2):201-221.
    [212].Lewis D D, Gale W A. A sequential algorithm for training text classifiers[C]. Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994), Dublin, Ireland,1994:3-12.
    [213].Agrawal R, Srikant R. Mining sequential patterns[C]. Proceedings of the Eleventh International Conference on Data Engineering (ICDE 1995), Taipei, Taiwan,1995: 3-14.
    [214].Pei J, Han J, Mortazavi-Asl B, et al. PrefixSpan:Mining sequential patterns efficiently by prefix-projected pattern growth[C]. Proceedings of the 17th International Conference on Data Engineering (ICDE 2001), Heidelberg, Germany,2001:215-224.
    [215].Han J, Kamber M. Data mining:concepts and techniques [M].2nd.ed. Massachusetts. USA:Morgan Kaufmann,2006.
    [216].Lewis J. Ossowski S, Hicks J, et al. Text similarity:an alternative way to search MEDLINE[J]. Bioinformatics,2006,22(18):2298-2304.
    [217].Srikant R, Agrawal R. Mining sequential patterns:Generalizations and performance improvements[C]. Advances in Database Technology,5th International Conference on Extending Database Technology (EDBT 1996), Avignon, France,1996:1-17.
    [218].荆涛.基于后缀数组的Web用户访问模式高效挖掘算法[D].长春:吉林大学,2005.
    [219].Manber U, Myers G. Suffix arrays:a new method for on-line string searches[C]. Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California,1990:319-327.
    [220].Yamamoto M, Church K W. Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus[J]. Computational Linguistics,2001, 27(1):1-30.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700