详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
With the rapid development of computer technology and the Internet, Web has been a global, huge, distribution and shared information space. As a huge resource base to people's learning, life and work, Web has brought tremendous convenience. But in the face of vast amounts of information on the Web, people are trapped in an awkward condition of "data rich, poor knowledge". Since most of the Web data is in the form of HTML, the application makes no direct access to information on the Web. Web information extraction technology is brought forth to resolve this problem.
     This paper analyzes some typical Information Extraction (IE) System and shows how to Extract personality information based on the personal needs of learners in Informatization Education. A personality information extraction system based on document structure tree has been implemented. The system includes two parts, which are the definition and execution of the extraction rules respectively. In the phase of the definition of extraction rules, first introduced is how to transform data represented by HTML to the well-formed XML document and how to get the DOM tree of the XML document. Then user specify the location of the information which will be extracted and map it to the target table to define the Extraction rules. In the phase of the execution of the Extraction rules, the system extracts the data of Web structure with user-defined extraction rules. Finally, it is stored in a structured way.
    [3]L.Eikvil.Information Extration from World Wide Web:A Surey.Technical Report 945,Norwegian Computing Centre.July,1999.
    [4]刘振岩,王万森,陈立平.Web信息检索与Web数据挖掘.微机发展[J].2003,07。Vol13 No.07
    [6]Gio Wiederhold Stanford Univ,Stanford,CA Mediators in the Architecture of Future Information Systems Computer archive Volume 25,Issue 3(March 1992)table of contentsPages:38-49 Year of Publication:1992 ISSN:0018-9162
    [7]牛成.Information Extraction basic concepts,key technologies,and applications.微软亚洲研究院2005年信息抽取技术暑期研讨班.
    [8]FreitagD.Information extraction from html:Application of a general learning approach In Processing of the 15~(th)Conference on Artificial Inteligence(AAAI-98),1998:pp517-523.
    [9]Musleal,Minton S,Knoblock C.A hierarchical approach to wrapper induction In Processing of third International Conference on Autonomous agents(AA-1998),1998.
    [10]Kim J,MoNovanD.Acquisition of Semantic Patterns for information Extraction from corpora.In Proceeding of the ninth IEE Conference on Artificial Intelligence for Applications,Los Alamitos,CA,IEEE Computer Society Press,1993:pp.171-176.
    [11]Rohini K.Srihari,Wei Li,Cheng Niu,Thomas Comell.InfoXtract:A Customizable Intermediate Level Information Extraction Engine.In Proceedings of HLT/NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems(SEALTS),2003:PP.52-59.
    [12]RalPh Grishman and John Stealing.New York University:Description of the PROTEUS System as used for MUC-5.In Proceedings of the Fifth Message Understanding Conference (MUC-5),Baltimore,MD.August 1993,Morgan Kauffmann.
    [13]S.Sodlerland.Learning information extraction rules for semi-structured and free text.Machine learning,1999,pp.1-44,pp233-272.
    [14]CaliH M,Mooney R.Relational Learning of Pattern-Match Rules for Information Extraction.Working papers for ACL-97 Workshop on National Language Learning.1997:pp9-15.
    [15]Chen H H,Ding Y W,Tsai et al.Description of the NTU system used for MET2.In Proceedings of the Seventh Message Understanding Conference,1998.
    [16]Zhang Y M,Zhou J E.A Trainable Method for Extracting Chinese Entity Names and Their Relations.In Proceedings of the Second Chinese Language Processing Workshop,Hong Kong,2000-10.
    [24]Srinicasan A,Camacho R.Experiments in numeric reasoning with inductive logic programming[R].Technical Report PRG-TR-22-96,Oxford University,Oxford,1996.
    [26]Laender A,Ribeiro-Neto B,Silva A.A brief Survey of Web Data Extraction Tools[J].SIGMOD Record,2002,31(2):84-93.
    [27]Califf M,Mooney R.Relational Learning of pattern-match rules for information extraction[Z].In proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence,Orlandp,Florida,1999.
    [28]Muslea I,Minton S,Knolock C.Hierachical wrapper induction for semistructured information sources[J].Autonomous Agents and Multi-Agent Systems,2001,pp93-114.
    [29] Craig A,Knoblock,Kristina L,et al. Accurately and reliably extracting data from the web:A machine learning approach[J].Data Engineering Bulletin,2000,pp33-41.
    [30] Muslea I,Minton S,Craig A,et al. Active learning for hierarchical wrapper induction[Z]. In proceedings of Sixteenth National conference on Artificial Intelligence and Eleventh conference on Innovative Application of Artificial Intelligence,Orlando,Florida,USA,1999.
    [31] HSU C N,DUNG M.Generating finite-state transducers for semi-structured data extraction from the web[J].Information System,1998,23(8):521-538.
    [31] KUSHMERICK N.Wrapper induction: efficiency and expressiveness[J].Artificial Intelligence Journal,2000,118(1/2): 15-68.
    [32] Embley D,Campbelld,Jiang S ,et al. Conceptual-model-based data extraction from,ultiple record web pages[J] .Data and Knowledge engineering, 1999,31 (3):227-251.
    [33] Christina Yip Chung,Michael Gertz,Neel Sundraesan. Reverse engineering for web data:From visual to semantic structures[Z].In Proceedings of 18~(th) International Conference on Data Engineering,San Jose,California,2002.
    [34] Christina Yip Chung,Neel Sundraesan.Quixote;Building XML repositories from topic specific web documents[Z]. In Fourth Int.Workshop on the Web an Databases,2001.
    [35] Robert Baumgartner ,Sergio Flesca,Georg Gottlob.suprevised wrapper generation with lixto[Z] .Proceedings of 27~(th) International Conference on Very Large Database, Roma, Italy, 2001.
    [36] Robert Baumgartner ,Sergio Flesca,Georg Gottlob.Visual web information extraction with lixto[Z].In proceedings of 27~(th) International Conference on Very Large Database, Roma, Italy, 2001.
    [37] Liu L,Pu C,Han W.XWRAP:An XML-enabled wrapper construction system for Web information sources[Z].In proceedings of the International Conference on Data. Engineering, San Diego,2000.
    [38] Liu L,Han W, Buttler D,et al. An XML-Based wrapper generator for Web information extraction[Z].In proceedings of ACM SIGMOD International Conference on Management Data , Philadelphia, Pennsylvania,USA, 1999.
    [39] Valter Crescenzi, Giansal Vatore Mecca.RoadRunner:towards automatic data extraction from large Web sites[Z]. In Proceedings of the 27th Conference on Very Large Database, Roma, Italy, 2001.
    [40]Arnaud Sahuguet,Fabien Azavant.Building intelligent web applications using lightweight wrappers[J].Data Knowledge Engineering,2001,36(3):283-316.
    [41]Arocena G,Mendelzon A.WebOQL:Restrucring documents,databases and webs[Z].In Proceedings of the 14~(th)ICDE Conference,Orlando,Florida,USA,1998.
    [42]Gudtavoo Arocena.WebOQL:Exploiting docement structure in Web queries[D].Toronto:Master's thesis,University of Toronto,1997.
    [46]Gaizauskas R,Wilks Y.Information Extraction:Beyond Document Retrieval[J].Journal of Documentation,1997

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700