中文事件抽取关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
信息抽取是从文本中自动获取信息的一种主要手段。针对自由文本的信息抽取一般包括实体及其关系的抽取。但真实世界不断发生变化,实体的关系和状态也随之发生变化。而事件反映了实体参与者之间的关系和状态的变化。因此要想捕捉到实体之间状态的变化,必须针对事件进行相关信息的抽取。
     目前事件的检测与识别(Event detection and recognition,VDR,又称事件抽取)已被ACE(Automatic content extraction)评测会议定义为一项基本任务。ACE2005将该项任务定义为:识别特定类型的事件,并进行相关信息的确定和抽取,主要的相关信息包括:事件的类型和子类型、事件论元角色等。根据这个定义,可将事件抽取的任务分成两大核心子任务:(1)事件的检测和类型识别;(2)事件论元角色的抽取。除此以外,由于绝大部分的论元角色都是实体,因此实体的识别也是事件抽取的一项基本任务。本文从事件的检测和类型识别,事件论元角色的识别,事件触发词的识别以及实体的识别几个方面对信息抽取进行了研究,最后还针对事件抽取探讨了可信度估计的方法。
     具体来讲,本文主要从以下几方面作了研究:
     (1)研究了扩展名实体的识别。
     尝试利用半监督学习方法获取模式来缓解缺少大规模的扩展名实体的标注语料的局限性。具体采用了Bootstrapping这种自训练方法来自动获取模式;在迭代过程中利用准确率较高的词典资源评价模式的可信度,进而通过模式的可信度来评价实例的可信度,从而避免了叠代过程中的错误放大问题。在此基础上,研究了模式的泛化方法,提出了软模式和特征向量两种模式泛化的形式,并通过联合概率、二元同现概率和相似度的计算实现了模糊匹配,有效地提升了模式的覆盖能力和系统的性能。
     (2)对事件的检测和分类,以及事件触发词的识别进行了相关研究。
     针对ACE语料中存在着规模小,类别不平衡等问题,尝试利用好的特征选择策略来克服一般分类器在小类别和难识别类别上性能不佳的弊端。提出了一种基于局部特征选择和正负特征相结合的特征选择策略,充分保证了分类器在每个类别(尤其是小类别和难识别类别)上的识别效果。除此以外,研究了在事件类别已知的情况下事件触发词的识别,提出充分利用正反例特征,和《同义词词林》、Hownet等语义词典扩展特征的基础上进行触发词的识别策略。
     (3)研究了事件论元角色的识别。
     为了充分利用词法、句法等不同层级的语言信息,提出利用多层级模式的方法来进行事件论元角色的识别。每一级模式都包含不同层级的语言信息,既充分利用了准确率高的浅层词法信息,也考虑到了更能反应语言意义的依存句法信息;同时在更深层次的模式中引入软匹配部分,使模式更灵活,实现了模式的模糊匹配。接着,又探讨了基于CRF模型的事件角色识别方法,同时在特征选择中,将模式及其相似度作为特征,不仅扩大了分类器中使用的特征范围,而且使用的特征更加细致和全面,获得了较好的事件角色识别效果。
     (4)探讨了事件抽取可信度估计的方法。
     针对事件抽取存在精确率不完美的问题,探讨了两种可信度估计方法,一种是利用源系统输出概率进行直接的可信度估计;另一种是独立的基于ME的可信度估计方法。并利用ROC方法对可信度估计进行了评价。结果表明,独立的可信度估计策略比直接利用源系统的输出进行可信度估计显示出了更好的估计能力,为系统的实际使用奠定了基础。
Information extraction (IE) is a fundamental technique for automatically obtaining information from texts. IE from free texts includes extraction of entities and relationships. However, with the continuous change of the real world, the entities’status and their relationship are varying. And events reflect this kind of changes. Therefore, in order to capture the changes, aiming at the events to extract the relevant information is necessary.
     Now Event Detection and Recognition (VDR) has been defined as a fundamental task in Automatic Content Extraction (ACE) evaluation plan. For example, the ACE2005 VDR task mainly involves detecting the events of some specified types, and extracting the relative information about these events. The relative information includes event attributes, event arguments and event mentions. Event attributes are event type, subtype, modality, polarity, genericity and tense. According to this definition, the task of VDR includes two subtasks :(1) event detection and classification, (2) recognition of argument roles. Since argument roles are often the entities being involved in an event, named entity recognition is a fundamental subtask for VDR. This thesis studies the three subtasks of VDR, which are expanded named entity recognition, event detection and classification, and recognition of argument roles. Finally, this thesis discusses the confidence estimation for VDR because the accuracy of VDR is not perfect.
     The following are the main research contents in this thesis:
     (1) Study on Expanded named entity recognition.
     In order to alleviate the scarcity of large-scale annotated corpus, the bootstrapping method, one of semi-supervised learning methods, is tried to obtain patterns automatically. And the selection and evaluation of the seeds and examples are discussed in detail. On this basis, this thesis focuses on pattern generalization, and presents two ways of pattern generalization. One is soft patterns, the other is feature vectors. Both of them improve the coverage of patterns and the system performance effectively.
     (2) Study on event detection and classification.
     Aiming at the small–scale size and data imbalance in the ACE corpus, this thesis tries to use good feature selection strategy to alleviate the problem that classifier performs poorly on the small and difficult types. An approach to identify Chinese event types is proposed in this thesis which combines a local feature selection and Positive and negative features. The approach fully ensures the performance of each type (especially the small and difficult types). Besides that, this thesis presents an approach to recognize the triggers based on the known event types using ME model. The approach uses the features existed in the positive and negative examples, and uses the two semantic dictionaries of Hownet and CiLin to expand the features.
     (3) Study on recognition of argument roles.
     Firstly, an approach using multi-level patterns to identify Chinese event argument roles is proposed. This approach introduces four levels of patterns to fully use the word and dependency grammar information. And patterns of the higher levels are soft patterns, which encompass flexible information and support fuzzy match. And then, this thesis tried to introduce multi-level patterns as the features into the CRF model to identify argument roles. The relative experiments show that the introduction of multi-level pattern into the statistical model can improve the system performance effectively.
     (4) Study on confidence estimation (CE) for event extraction.
     Aiming at the imperfect precision of automatic event extraction, two methods of confidence estimation have been discussed. One method is using the system output probability to estimate the confidence; the other is using a separate CE module based on Model. And then the ROC method is used to evaluate the CE results. The relative experimental results show that the strategy of using separate CE module has better evaluation power than that of using the original system output, which can provide more useful information in the system applications.
引文
1 F. Dayne. Machine Learning for Information Extraction in Informal Domains. Ph.D. Dissertation of Carnegie Mellon University, 1998:1-135
    2 W. W. Cohen, A. McCallum. Information Extraction and Integration: an Overview, KDD 2003 Tutorial, Washington DC, U.S.A, 2003:1-89
    3 N. Sager. Natural Language Information Processing. Reading. Massachusetts: Addison Wesley, 1981:27-80
    4 S. G. Soderland. Building a Machine Learning Based Text Understanding System. Proceedings of workshop on Adaptive Text Extraction and Mining (ATEM-2001) at 17th International Joint Conference on Artificial Intelligence (IJCAI-2001), USA, 2001:133-154
    5 A.McCallum. Information Extraction: Distilling Structured Data from Unstructured Text. ACM Queue 2005, 2005: 49~57
    6 R. Feldman. Information Extraction–Theory and Practice. ICML 2006 Tutorial, Pennsylvania, USA, 2006:1-103
    7李保利,陈玉忠,俞士汶.信息抽取研究综述.计算机工程与应用,2003, 39 (10):1~5
    8 A. Lavelli , M. E. Califf , F.Ciravegna etal. IE evaluation: Criticisms and recommendations. proceedings of AAAI2004 on the Workshop of Adaptive Text Extraction and Mining ( ATEM ) , San Jose, U.S.A, 2004:279-299
    9 The ACE 2005 Evaluation Plan, http://www.ldc.upenn.edu/Projects/ACE/Annotation/
    10 I. Muslea. Extraction Patterns for Information Extraction Tasks: A Survey. Proceedings of AAAI-99 Workshop on Machine Learning for Information Extraction,Orlando, Florida, USA,1999:1-6
    11 D Freitag, A. McCallum. Information Extraction with HMM Structures Learned by Stochastic Optimization. Proceedings of the 17th National Conference on Artificial Intelligence,Texas, U.S.A., 2000: 584-589.
    12 D. Freitag, A. McCallum. Information Extraction with HMMs and Shrinkage. Proceedings AAAI'99 workshop on Machine Learning forInformation Extraction, Orlando, Florida, USA, Cambridge:MIT Press, 1999:31-36
    13 T. R. Leek. Information Extraction Using Hidden Markov Models. Master thesis of University of Florida, 1997:1-133
    14 R. Grishman. Information Extraction: Techniques and Challenges. In Information Extraction (International Summer School SCIE-97). Springer-Verlag, 1997:10-27
    15徐超.基于种子自扩展的命名实体关系抽取方法的研究.华中师范大学硕士论文, 2006:1-43
    16李芳,盛焕烨等.多语种投资信息抽取系统地实现.上海交通大学学报,2004(1): 22-25
    17孔祥勇,张冬茉.一种信息抽取系统中汉语同指消解算法.计算机工程,2003,29(16):76-78
    18胡睿,张冬茉等.基于结点语义关系的信息抽取技术.计算机工程,2001(4): 26-28
    19孙斌.信息提取技术概述.自然语言处理,2003(1):34-37
    20孙斌.继承—归纳机制及其在对象系统和信息提取技术中的应用.北京大学博士论文, 2000: 1-96
    21袁毓林.用动词的论元结构跟事件模板相匹配.中文信息学报,2005,19(5):37-45
    22袁毓林.信息抽取的语义知识资源研究.中文信息学报,2002,16(5):8-14
    23周剑辉.基于规则自动获取的金融事件抽取研究.清华大学硕士论文, 2003: 8-46
    24吴平博,陈群秀,马亮.基于事件框架的事件相关文档的智能检索研究.中文信息学报,2003 ,17 (6):25-30.
    25吴平博,陈群秀,马亮.基于时空分析的线索性事件的抽取与集成系统研究.中文信息学报,2006,20(1): 21~28
    26梁晗,陈群秀,吴平博.基于事件框架的信息抽取系统. 2006,20 (2):41~46
    27姜吉发.一种跨语句汉语事件信息抽取方法.计算机工程,2005,31(2):27-29
    28朱靖波,姚天顺.中文信息自动抽取.东北大学学报, 1998 ,19(1) :52-56
    29于琨,管刚等.基于双层级联文本分类的简历信息抽取.中文信息学报,2006, 20(1): 59-66
    30梅雪;程学旗,郭岩等.一种全自动生成网页信息抽取Wrapper的方法.中文信息学报,2008, 22(1): 22-29
    31吴芬芬,刘磊,肖宪.一种启发式的信息抽取算法.吉林大学学报(理学版), 2007,45(1): 73-76
    32贺智平,徐学洲,李爱玲.一种基于信息熵的Web页面主题信息抽取方法.计算机工程与应用,2007, 43( 4): 164-166
    33陈静.基于本体的信息抽取研究.苏州大学硕士论文,2007:1-50
    34刘迁,焦慧,贾惠波.信息抽取技术的发展现状及构建方法的研究.计算机应用研究,2007,24( 7): 6-9
    35 A.H. Doan, R. Ramakrishnan, S. Vaithyanathan. Managing Information Extraction. SIGMOD2006 Tutorial, Chicago, Illinois, USA., 2006:1-89
    36 E. Agichtein , S. Sarawagi. Scalable Information Extraction and Integration. KDD2006 Tutorial, Philadelphia,USA , 2006:1-103
    37 N. Kushmerick, B. Thomas. Adaptive information extraction: Core technologies for information agents. In Intelligents Information Agents R&D In Europe: An AgentLink Perspective, 2002:79-103
    38 N. Koudas, D. Srivastava, S. Sarawagi. Record Linkage: Similarity Measures and Algorithms, SIGMOD 2006 Tutorial, Chicago, Illinois, USA., 2006:1-98
    39 A. Moschitt. Open Domain Information Extraction via Automatic Semantic Labeling. Proceedings of the 2003 Special Track on Recent Advances in Natural Language Processing, Florida , USA, 2003:435-439
    40 M. A. Greenwood, M. Stevenson. Improving Semi-Supervised Acquisition of Relation Extraction Pattern. Proceedings of the ACL2006 Workshop on Information Extraction Beyond The Document, Australia, Sydney, 2006: 29–35
    41 J. Park, D Barbosa. Adaptive Record Extraction From Web Pages. Poceedings of WWW2007, Banff, Alberta, Canada, 2007:502-511
    42 Markup Language for Temporal and Event Expressions ,http://timeml.org/site/
    43 K. Aggour, J. Interrante, I. Gokeen. Integrating Techniques for Event Based Intelligence Gathering. Proceedings of AAAI2006 Workshop on the Event Extraction and Synthesis, Boston, Massachusetts, USA. 2006:33-39
    44 E. Wagner, J.H. Liu, L. Birnbaum, etc. Using Explicit Semantic Models to Track Situations Across News Articles. Proceedings of AAAI2006 Workshop on the Event Extraction and Synthesis, Boston, Massachusetts, USA. 2006:50-56
    45 B. Gong, U. Westermann, etc. Event Discovery in Multimedia Reconnaissance Data Using Event Clustering. Proceedings of AAAI2006 Workshop on the Event Extraction and Synthesis, Boston, Massachusetts, USA. 2006:40-45
    46 F.Y. Xu, H. Uszkoriet, and H. Li. Automatic Event and Relation Detection with Seeds of Varying Complexity/ Proceedings of AAAI2006 Workshop on the Event Extraction and Synthesis, Boston, Massachusetts, USA. 2006:5-11
    47 M.Naughton, N.Kushmerick, and J.Carthy. Event Extraction from Heterogeneous News Sources. Proceedings of AAAI2006 Workshop on the Event Extraction and Synthesis, Boston, Massachusetts, USA. 2006:21-26
    48 J.H. Liu, L. Birnbaum,etc. Using Explicit Semantic Models to Track Situations Across News Articles. Proceedings of AAAI2006 Workshop on the Event Extraction and Synthesis, Boston, Massachusetts, USA. 2006:46-49
    49 S. Sekine. Named Entity: History and Future. Technical report, 2004, http://cs. nyu. edu/sekine/papers/
    50杨尔弘,方莹,刘冬明等.汉语自动分词和词性标注评测.中文信息学报,2006,20(1):44-49
    51赵健.条件概率模型及其在中文名实体识别中的应用研究.哈尔滨工业大学博士论文,2006:1-100
    52张华平,刘群.基于角色标注的中国人名自动识别研究.计算机学报,2004,27(1): 85-91
    53 D. M. Bikel, R. Schwartz, R. M. Weischedel. Nymble: a high-performancelearning name-finder. Proceedings of the Fifth Applied Natural Language Processing Conference. San Francisco: Morgan Kaufmann,1997:278-282
    54刘非凡,赵军等.面向商务信息抽取的产品命名实体识别研究.第八届全国计算语言学联合学术会议(JSCL-05),南京,2005,北京,清华大学出版社:415-421
    55 M. Thelen, E. Riloff. A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, USA, 2002: 214-221
    56 X.J. Zhu. Semi-supervised Learning Literature Survey. Technical Report, http://www.cs.wise.edu/~jerryzhu/pub/ssl_survey.pdf, 2006: 1-70,
    57 D.Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, Massachussets, 1995:389-394
    58 M. Collins, Y. Singer. Unsupervised models for named entity classification. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. MD: College Park,1999: 100–110
    59 Steven Abney. Understanding the Yarowsky Algorithm, computational linguistics, 2004, 30(3):365 - 395
    60冯志伟.计算语言学基础.商务印书馆,2001: 1-105
    61 Riloff, E., Wiebe, J., &Wilson, T. Learning subjective nouns using extraction.pattern bootstrapping. Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003, 2003:25-32
    62 W. Lin, R. Yangarber, R. Grishman. Bootstrapped Learning of Semantic Classes from Positive and Negative Examples. Proceedings of ICML2003 Workshop on The Continuum from Labeled to Unlabeled Data, Washingtong DC, USA, 2003: 103-101
    63 Sergey Brin. Extracting patterns and relations from the World-Wide Web. In Proceedings of the 1998 International Workshop on the Web and Databases (WebDB’98), 1998:172-183
    64 T. Pedersen and R. Bruce. Distinguishing word senses in untagged text. InProceedings of the Second Conference on Empirical Methods in Natural Language Processing, Providence, RI, 1997: 197–207
    65 Silviu Cucerzan, David Yarowsky. Language independent named entity recognition combining morphological and contextual evidence. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora(EMNLP/VLC-99), College Park, MD, USA., 1999:910-916
    66 Roman Yangarber, Winston Lin, Ralph Grishman. Unsupervised learning of generalized names. In Proceedings of the Nineteenth International Conference on Computational Linguistics. San Francisco: Morgan Kaufmann., 2002: 541-547
    67 Siddharth Patwardhan, Ellen Riloff. Learning Domain-Specific Information Extraction Patterns from the Web. Proceedings of the ACL2006 Workshop on Information Extraction Beyond the Document, 2006: 66-73
    68 Fabio Ciravegna. Adaptive Information Extraction from Text by Rule Induction and Generalization. Proceedings of the 17th. International Joint Conference on Artificial. Intelligence (IJCAI-2001), Seventeenth International Joint Conference on Artificial Intelligence. Seattle, Washington, USA., 2001: 1251-1256
    69 G. Salton. Automatic Text Processing. New York: Addison-Wesley, 1989:513-523
    70 U. Y. Nahm and R. J. Mooney. Mining soft-matching rules from textual data. Proceedings of the 17th International Joint Conference on Articial Intelligence (IJCAI-2001),Seattle, Washington, USA., 2001: 979-986
    71 H. Cui, M.Y. Kan, T. S. Chua. Unsupervised Learning of Soft Patterns for Generating Definitions from Online News. Proceedings of the 13th World Wide Web Conference, New York, USA, 2004: 90-98
    72 S. Cucerzan, D. Yarowsky. Language independent NER using a unified model of internal and contextual evidence. Proceedings of The Sixth Conference on Natural Language Learning (CoNLL-2002), Taipei,Taiwan. 2002: 171–175
    73 M. Stevenson, M. A. Greenwood. A Semantic Approach to IE PatternInduction. Proceedings of the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, USA, , 2005: 379-386
    74 S. Patwardhan, E. Riloff. Learning Domain-Specific Information Extraction Patterns from the Web. Proceedings of the Workshop on Information Extraction Beyond the Document,Sydney, 2006: 66-73.
    75 P. P. Talukdar, T. Brants. A Context Pattern Induction Method for Named Entity Extraction. Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), , New York City, 2006: 141–148.
    76 Z.H. Zheng, X.Y. Wu, R. Srihari. Feature Selection for Text Categorization on Imbalanced Data. SIGKDD Explorations, 2004, 6(1):80-89
    77苏金树,张博锋,徐昕,基于机器学习的文本分类技术研究进展,软件学报,2006, 17(9):1848-1859
    78 G. Forman. a Pitfall and Solution in Multi-Class Feature Selection for Text Classification. Proceedings of the 21st International Conference on Machine Learning (ICML2004), Banff, Canada, Morgan Kaufmann Publishers, 2004(9):38-46
    79 F. Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 2002, 34(1):1-47
    80刘建毅,王菁华,王枞.文本网络表示研究与应用.中国科技论文在线,available on www.paper.edu.cn, 2007
    81 C. D. Manning,H. Schutze.统计自然语言处理基础.苑春法等译,电子工业出版社,2005:330-354
    82 H. Liu, L. Yu. Toward Integrating Feature Selection Algorithm for Classification and Clustering. IEEE Transaction on Knowledge and Data Engineering, 2005, 17(4): 491-502
    83 G. Forman. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research, 2003, 3:1289-1305
    84 Y. Yang, J. O. Pedersen. A Comparative Study on Feature Selection in Text Categorization. Proceeding of the Fourteenth International Conference on Machine Learning (ICML'97), Nashville, Tennessee, USA , 1997: 412-420
    85 A. Ratnaparkhi. A Simple Introduction to Maximum Entropy Models forNatural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania, http://citeseer.ist.psu.edu/128751.html
    86邓乃扬,田英杰.数据挖掘中的新方法--支持向量机.科学出版社2004:1-408
    87 N. Cristianini, J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000:1-100
    88赵妍研,王啸吟,秦兵等.中文事件抽取中事件类别的自动识别.第三届学生计算语言学研讨会论文集,沈阳,2006: 240-245
    89 D. Ahn. The stages of event extraction. Proceedings of the ACL2006 Workshop on Annotating and Reasoning about Time and Events, Sydney, Australia, 2006:1~8
    90 S. Bethard, J. H. Martin, Identification of Event Mentions and their Semantic Class, Proceeding of 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP2006), Sydney, Australia, 2006. 146-154
    91梅家驹.同义词词林.上海辞书出版社, 1996
    92哈工大信息检索实验室.同义词词林扩展版(电子版. 2006 ,http://ir.hit.edu.cn/
    93董振东,董强. HowNet2005. http://www.keenage.com. 2005
    94 ACE Chinese Annotation Guidelines for Events, Version 5.5.1, Linguistic Data Consortium, 2005, http://www.ldc.upenn.edu/Projects/ACE/
    95 C. J. Fillmore , S. Narayanan , C. F. Baker. What Can Linguistics Contribute to Event Extraction. Proceedings of AAAI 2006 Workshop on Event Extraction and Synthesis, Boston, Massachusetts, USA, 2006:236-253
    96 S. B. Zhao, R. Grishman. Extracting Relations with Integrated Information Using Kernel Methods. Proceedings of the 43rd Annual Meeting of Association of Computational Linguistics(ACL2005), Ann Arbor, 2005: 419-426
    97 M. Stevenson, Mark A. Greenwood. Comparing Information Extraction Pattern Models. Proceedings of the ACL2006 Workshop on Information Extraction beyond the Document, Australia, Sydney, 2006: 12–19.
    98刘开瑛.中文文本自动分词和标注.商务印书馆. 2000:56-59
    99赵铁军等.机器翻译原理.哈尔滨工业大学出版社,2000:83-89
    100瓮富良,王野翊.计算语言学导论.中国社会科学出版社,1998:66-69
    101 C. Sutton,A. McCallum.An Introduction to Conditional Random Fields for Relational Learning. In Introduction to Statistical Relational Learning. MIT Press, 2006:591-598
    102 S. Gandrabur, G. Foster, G. Lapalme. Confidence Estimation for NLP Applications. ACM Transactions on Speech and Language Processing, 2006, V(N): 1-28
    103 M. Collins. Discriminative reranking for natural language parsing. Proceedings of 17th International Conference on Machine Learning, Morgan, Stanford, Ca. 2000: 175–182
    104 J. Blatz, E. Fitzgerald, G. Foster etal. Confidence Estimation for Machine Translation. Proceedings of Coling2004, Geneva, 2004: 315-321
    105 Kaufmann, S. Francisco, C.A.Simona etc. Confidence Estimation for Text Prediction. Proceedings of the Conference on Natural Language Learning (CoNLL 2003), Edmonton, Canada. 2003:1621-1624
    106 S. Gandrabur, G. Foster, Confidence Estimation for Translation Prediction. Proceedings of CoNLL-2003, Edmonton, Canada, 2003: 95–102.
    107 S. J. Delany, P. Cunningham, and D. Doyle. Generating Estimates of Classification Confidence for a Case-based Spam Filter. Tech. Rep. TCD-CS-2005-20, Dublin, Ireland, 2005:833-843
    108 B. Maison, and R. Gopinath. Robust confidence annotation and rejection for continuous speech recognition. Proceedings of the International Conference ON Acoustics, Speech, Signal Processing(ICASSP 2001), 2001:213-220
    109 R. Manmatha, and H. Sever. A formal approach to score normalization for meta-search. Proceedings of Second International Conference on Human Language Technology Research(HLT 2002), Morgan Kaufmann, San Francisco, 2002: 98–103.
    110 L. Gillick, Y. Ito, and J. Young. A probabilistic approach to confidence measure estimation and evaluation. In ICASSP 1997, 1997: 879–882.
    111 J. Xu, A. Licuanan, J. May etal. Answer selection and confidence estimation. Proceedings of The Tenth Text REtrieval Conference (TREC 2002), Gaithersburg, Maryland , USA, 2002:344-364
    112 T. Kristjansson, A. Culotta, P. Viola etal. Interactive Information Extraction with Constrained Conditional Random Fields. Proceedings of the Nineteenth National Conference on Artificial Intelligence(AAAI2004), Redmond, Washington, USA, 2004:412-418
    113 A. Culotta, A. McCallum. Confidence Estimation for Information Extraction. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics(HLT-NAACL), 2004:157-169
    114 K. H. Zou. Receiver operating characteristic (ROC) literature research. http://splweb.bwh.harvard.edu:8000/pages/ppl/zou/roc.html, 2004 : 609-616
    115 J. Davis, M. Goadrich. The Relationship Between Precision-Recall and ROC Curves. Proceedings of ICML 2006, Pittsburgh, Pennsylvania, USA, 2006:868-873
    116 R. O. Duda,P. E.Hart ect,模式分类(李宏东等译),机械工业出版社,2003:145-156

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700