详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
REF (Related Entity Finding,相关实体查找)是TREC (Text Retrieval Conference,文本检索会议)实体检索中非常有前景的研究课题,对它的研究将对搜索引擎和人们对网络信息的处理方式带来巨大的改变。REF的要求是根据提供的topic的信息,通过互联网和相关数据库抽取出与topic相对应的相关实体答案以及对应实体主页。本文对国内外的现状和一些前沿的算法进行了研究,并对关键词的提取和扩展,文本的检索,段落的切分和相关度计算,命名实体识别,实体排序和支撑文档的检索等几个方面逐个分析和研究,对实现过程的改进和创新如下:
REF (Related Entity Finding) is the TREC (Text Retrieval Conference) physical retrieval is a promising research topic. REF requirement is that the topic information, extracted via the Internet and related database that corresponds with the topic of the relevant entities of the answers and the corresponding entities Home. The status quo at home and abroad, and some cutting-edge algorithms, calculated from the extraction and expansion of key words, text retrieval, paragraph segmentation and correlation, named entity recognition, entity sorting and supporting documentation to find, etc. the implementation process of research and analysis, mainly to complete the work of the following aspects:
     (1) For the entire page text improved approach for short text paragraph, which removed a lot of text content, reducing the size of the returned text to improve the system processing efficiency.
     (2) According to Wikipedia's structural features, the use of synonyms and hypernyms in Wikipedia is built based on the Wikipedia category dictionary, and for entity extraction part, adapted to the entity type of the REF project this year, and fine features, while improving the entity extraction the accuracy of.
     (3) Add the word density-based algorithm, the proofing of the DCM model results, and achieved fairly good results.According to the answer to last year's model of DCM Documentation Center in the calculation formula parameters adjusted, the model has been improved.
[1]Voorhees E, Tice D. The TRECB question answering track evaluation[C].In:Proceedings of the 8th Text Retrieval Conference. Gaithersburg, 2000
    [2]Voorhees E. Overview of the TREC 2003 question answering t rack[C], In: Proceeding of the 11th Text Retrieval Conference. Gaithersburg,2003
    [3]H.Chen, H. Shen, J. Xiong, etal. Social Network Structure Behind the Mailing Lists:ICT_IIIS at TREC 2006 Expert Finding Track[C]. In:Proceeding of the 15th Text Retrieval Conference. Gaithersburg,2006
    [4]Neumann Gunter, XuFei-yu. Mining answers in German web pages[C].InProceedings of the IEEE/WIC International Conference on Web Intelligence (WI'03),2003
    [5]Ferret O, Grau B, Hurault-Plantet M. Finding an answer based on the recognition of the question focus[C].InProceeding of the 9th Text Retrieval Conference. Gaithersburg,2001
    [6]Kim Soo-Min, BaekDae-Ho, Kim Sang-Beom, etal. Question answering considering semantic categories and co-occurrence density[C}. InProceedings of the 8th Text Retrieval Conference. Gaithersburg,2000
    [7]Clarke C L A, Cormack G V, Lynam T R. Exploiting redundancy in question answering[C]. InProceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans.Louisiana, USA,2001
    [8]Ellen M. Voorhees. Overview of the TREC-9 Question Answering Track [C], InProceedings of the Ninth Text Retrieval Conference (TREC 2000). Gaithersburg, MD,US,2000
    [9]E. Voorhees. Overview of the TREC 2001 question answering Track[C].InProceedings of the 10th Text REtrieval Conference. Gaithersburg. Maryland,2001
    [I0]EIIen M. Voorhees. Overview of the TREC2002 Question Answering Track[C].In Proceedings of the Eleventh Text Retrieval Conference (TREC 2002).Gaithersburg,MD. US,2002
    [11]Evgeniy, G., Shaul, M., Computing Semantic Relatedness using Wikipedia-Based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07), Hyderabad, India,2007
    [12]Girju, R., Badulescu, A., Moldovan, D., Learning semantic constraints for the automatic discovery of part-whole relations. In Proceedings of HLT-NAACL'03, 2003
    [13]Roth, D., Yih, W., Probabilistic Reasoning for Entity & Relation Recognition. In Proceedings of 19th International Conference on Computational Linguistics (COLING'02),2002
    [14]Roth, D., Yih, W., A linear programming formulation for global inference innatural language tasks. In Proceedings of the 8th International Conference on Computational Natural Language Learning (CoNLL'04),2004
    [15]Ruiz-Casado, M., Alfonseca, E., Castells, P., Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia. In Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB'05),2005
    [16]Kushmerick, N., Wrapper Induction for Information Extraction, [Dissertation],Univ. of Washington,1997
    [17]Auer, S., Lehmann, J., What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. In Proceedings of the 4th European Semantic Web
    Conference (ESWC'07),2007
    [18]Brin, S., Extracting patterns and relations from the World Wide Web. In Proceedings of the 1st International Workshop on the Web and Databases (WebDB'98),Valencia, Spain,1998
    [19]Agichtein, E., Gravano, L., Snowball:Extracting Relations from Large Plain-text Collections. In Proceedings of the 5th ACM International Conference on Digital Libraries (DL'00),2000
    [20]Pantel, P., Pennacchiotti, M., Espresso:Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In Proceedings of 23rd International Conference on Computational Linguistics (COLING'06),2006
    [23]D Lin.An information-theoretic definition of similarity[C]. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI,1998
    [25]Srihari R., Li W. A Question Answering System Supported by Information Extraction[C]. In Proceedings of the 1th Meeting of the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL-00),2000
    [26]Ravichandran D., HovyE..Learning Surface Text Patterns for a Question Answering System[C].In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL2002). Pennsylvania,2002
    [27]Attardi G., Cisternino A., Formica F., Simi M., and Tommasi A. PiQASso, Pisa Question Answering System[C]. In Proceedings of the 10th Text Retrieval Conference(TREC 2001), Gaithersburg, Maryland,2002
    [28]Brill E., Lin J., Banko M., DumaisS.,&Ng A. Data-Intensive Question Answering[C].In Proceedings of the 10th Text REtrieval Conference (TREC 2001). Gaithersburg,Maryland,2002
    [30]H H Chen, Y W Ding, S C Tsai. etal. Description of the NTU System Used forMET2[C]. InProceedings of the 7th Message Understanding Conference (MUC-7).San Francisco,1998
    [31]Peng, F., McCallum, A., Accurate information extraction from research papersusing CRFs. In Proceedings of Human Language Technology conference/NorthAmerican chapter of the Association for Computational Linguistics annual meeting(HLT/NAACL'04),2004
    [32]Tang, J., Hong, M., Li, J., Liang, B., Tree-structured Conditional Random Fieldsfor Semantic Annotation. In Proceedings of 5th International Semantic Web Conference(ISWC'06),2006
    [33]Lafferty, J., McCallum, A., Pereira, F., Conditional Random Fields: ProbabilisticModels for Segmenting and Labeling Sequence Data. In Proceedings of the 18thInternational Conference on Machine Learning (ICML'01),2001
    [34]Ray, S., Craven, M., Representing sentence structure in hidden markov modelsfor information extraction. In Proceedings of the 17thInternational Joint Conference onArtificial Intelligence (IJCAI'01), Seattle, Washington, USA,2001
    [35]Fien D.M., Walter D. Memory-based named entity recognition using unannotateddata[C]. InProceedings of CoNLL-2003.2003
    [36]Hideki L, Hideto K. Efficient support vector classifiers for named entity recognition[C].InProceedings of Coling-2002.2002
    [39]John Lafferty, Andrew McCallum, Fernando Pereira. Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]. InProceedingsof the International Conference on Machine Learning (ICML-2001). 2001
    [40]Jenny Rose Finkel,TrondGrenager,and Christopher Manning. IncorporatingNon-localInformationinto Information Extraction Systems by Gibbs Sampling[C].InProceedings of the 43nd Annual Meeting of the Association for ComputationalLinguistics (ACL).2005
    [42]Balog, K., Azzopardi, L.&deRijke, M.. Formal models for expert finding inenterprise corpora[C]. InProceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR06). New York, NY,USA,2006
    [43]C.S.Campbell, P. P Maglio, A. Cozzi, B. Dom. Expertise identification using emailcommunications[C]. In Proceedings of the twelfth international conference onInformation and knowledge management.2003
    [44]Macdonald, C., Ounis, I..Voting for candidates:adapting data fusion techniques for an expert search task[C]. InProceedings of the 15th ACM International Conference onInformation and Knowledge Management{CIKM06).2006
    [45]Yi Fang, Luo Si, AdityaMathur, "FacFinder:Search for Expertise in Academic Institutions"[R], Technical Report, SERC-TR-294 and Department of ComputerScience. Purdue University,2008
    [46]Davenport,T., Prusak,L.. Working Knowledge:How Organizations Manage What TheyKnow:Harvard Business School Press,1998
    [47]Lin, C., Griffiths-Fisher, V Ehrlich, etal.SmallBlue:People Mining for ExpertiseSearch and Social Network Analysis[J]. IEEE Multimedia Magazine. 2008
    [48]Yoav Freund and Robert E-Schapire. A decision-theoretic generalization of on-linelearning and all application to boosting[J]. Journal of Computer and System Sciences.1997,55(1):119-139
    [49]Jarvelin, K. and Kekalainen, J. Cumulated Gain-based Evaluation of IR Techniques[J].ACM Transactions on Information Systems.2002,20(4):422446
    [50]Zhu, X., Ghahramani, Z., Learning from Labeled and Unlabeled Data fromLabel Propagation. Tech. Report. CMU, June 2002

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700