文本间语义相关性计算及其应用研究

英文题名：Research on Text Semantic Relevance Calculation and Its Application
作者：赵玉茗
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：相关性 ; 语义 ; 词汇集聚 ; 文本过滤 ; 自动问答
英文关键词：Relevant ; Semantic ; lexical cohension ; text filtering ; Question-Answering
学位年度：2009
导师：王晓龙
学科代码：081203
学位授予单位：哈尔滨工业大学

摘要

在信息飞速膨胀的当今世界,文本由于其表达灵活、信息容量大以及最为关键的人性化特点,一直是信息传播和存储的主要形式。如何对浩如烟海的文本数据进行处理,帮助人们更好地管理和使用这些数据,是如今这个信息时代需要研究和解决的根本问题之一。而对文本之间的关系进行考查,将这些纷杂的文本依据它们的内容进行合理的关联和区分,从而使更加复杂和深入的后续处理能够被顺畅的应用,则成为文本信息处理的首要内容。
     长期以来,由于计算机领域的研究者们缺乏对相关性概念的深入思考,使得在文本间关系的考察中,以相似性度量代替相关性度量的方法长期占据主流地位。尽管在一些情况下,相似性度量能够在一定程度上模拟相关性度量。但是,在很多着重强调考察文本之间关联程度而非相似程度的应用当中,由于此类方法的出发点与应用的关注目标之间存在偏差,因此往往不能很好的满足应用对计算效果的要求。
     本文借助认知科学与信息科学等多个领域的研究者对相关概念的实质所进行的深入分析,在现有的技术条件下,对用户的一般性知识基础加以利用,在语义层面上通过对系统角度的相关性计算模式进行改进,使之向用户角度的相关性计算靠近,对人类的相关判断行为进行模拟。针对语句和文档这两种不同规模和级别的文本,本文对它们的相关性计算方法分别进行了研究,并探讨了它们各自在相关领域中的应用。具体内容包括以下几个方面。
     面向自动问答系统中候选答案语句抽取的任务,提出了基于系统相似理论的加强型系统相似模型,用以对问答系统中用户查询问句与候选文档问句之间的关系进行计算。该模型引入候选答案要素,赋予其相应的模拟相似度,使其对语句之间相似度产生正向贡献,进而实现相似性度量到相关性度量的转变,更加准确地满足问答系统的需求。以该语句相关性计算方法为主要创新点的问答系统在目前国际权威的问答系统评测中获得了优异的评测成绩,同时,在此评测数据集上针对该方法的测试结果也体现了该方法性能的优越性。
     除了对语句一级的文本间语义相关性计算方法进行研究,本文对文档之间的相关性度量也提出了新的计算方法。利用文档所具有的词汇集聚特性,借助语义辞典等知识源,本文对文档中词语间的语义链接关系进行了定义与考察,并以之为基础提出了文档的词汇链形式化表示、词汇链权重计算,以及相应的文档匹配等方法。在对人类相关性判断行为的特点进行分析的基础上,提出了利用文本分类对相关性计算效果进行考察的评价方法。实验证明,基于词汇集聚的文档相关性计算方法取得了良好的计算效果。
     在此基础之上,本文提出了可调节距离的词汇间链接关系定义方法,并且对文档词汇集聚所形成的词簇的内部结构做了进一步的分析,提出了对词簇结构信息加以利用的基于结构化词汇集聚的文档相关性计算方法。在相关实验中,该计算方法的优越性得到了充分的证实。
     此外,面向药物开发过程中,药代动力学模型训练所需的相关参数缺乏的问题,本文对基于词汇集聚的文本相关性计算方法在生物医药领域药代动力学参数相关文档过滤中的应用进行了研究,同时包括了系统的结构设计以及针对应用领域的特点所采取的特殊的文本预处理方法。在针对酶作用物、引物和抑制剂三个类别的8种药物的实验中,以基于词汇集聚的文本相关性计算方法为核心的文本过滤系统取得了良好的计算效果,对提高生物医药领域药品开发过程的效率具有非常重大的实际意义。
In the world with enormous information, text is the important format for information distributing and storage, for its flexibility, capability and convenience. How to process the masses of text data so that they can be managed and made use of efficiently is one of the fundamental problems of this age. And in text processing, measuring the relationship between the texts and making the chaotic texts into clusters according to their content so that the detailed following process can be applied on them is a paramount problem.
     For a long time, since lacking of deep discussing on the connotation of the concept of“relevant”, researchers of computer science always use the text similarity calculating instead of the text relevance calculating in texts relationships measuring. But this approximate method with inexplicit incentive can not satisfy the requirements of the applications emphasizing“relevant”.
     In this paper, based on the analysis of the“relevant”concept offered by the researchers from both of the cognize and information science, the system oriented relevance calculating mode is improved at the semantic level. It takes advantages of general knowledge of the users, and makes the system oriented relevance calculating mode moved towards the user oriented mode in order to simulate the human relevant judgments. For two sub-types of texts, sentences and documents, we do research on the relationships measuring of them respectively. And the corresponding applications of them are also discussed. The detailed content of this paper includes:
     An improved system similarity model based on the system similarity theory for sentences retrieval in Question-Answering system is proposed. It makes the latent answer elements contribute to the text similarity degree through offering respective simulated similar parameter. In this way, it changes the similarity calculating model into the relevance calculating model, and satisfies the requirement the Question-Answering system. The system which takes this processing as the main character achieved excellent result in the authoritative international test and the further evaluation of this method on this test data also confirms its effectivity.
     Besides the calculation between sentences, a novel relevant calculating method between documents is proposed. Based on the lexical cohension theory, and with the help of knowledge resources, we detect the semantic relationship between words, and propose a document representation method based on lexical chain, a lexical chain weight calculating method and a respective documents matching method. Depending on the analysis of the features of human relevant judgments, we proposed an evaluating method for document relevant calculation through documents classification. The test results show that the lexical cohension based method works successfully.
     Further more, we present a distance flexible method for the detection of words semantic relationship. And through analyzing the inner structure of lexical cohension, we present a document relevant calculating method based on lexical cohension with structure information. And the advantage of this method is proved in the experiments.
     To support the training of the pharmacokinetics model in new drug development, we do research on the application of text filtering. The filter system gets the papers about pharmacokinetics parameters by applying document relevant calculation based on lexical cohension. The structure of it and the special text pre-processing for this special field are also described. In the evaluation for 8 drugs of 3 classes, substrate, inducer and inhibitor, it is indicated the filtering system which takes the document relevance calculating method based on lexical cohension as the central processing step gets excellent results. It makes significant effort in improving the efficiency of the drug development.

引文

1. P. Merlo, G. Schneider, E. Wehrli. Learning Document Similarity Using Natural Language Processing. Linguistik online. 2003, 17, 5: 99-115
    2. T. Kohonen, K. Lagus, J. Salojarvi, V. Paatero, A. Saarela. Self Organisation of a Massive Document Collection. IEEE Transactions on Neural Networks. 2000, 11(3): 574-585
    3. A. Rauber. The SOMLib digital library system. In: Proceedings of the 3rd Europ Conf on Research and Advanced Technology for Digital Libraries (ECDL'99). Paris, France. 1999: 323-341
    4. G. Salton. Computer evaluation of indexing and text processing. 1971:143–180
    5. J. Rocchio. Relevance feedback in information retrieval. Englewood Cliffs, New Jersey. Prentice Hall, Ing. 1971
    6. M. Lesk. Automatic Sense Disambiguation Using Machinereadable Dictionaries dictionaries: How to tell a pine cone from an ice cream cone. Proceedings of the SIGDOC Conference. 1986: 24-26
    7. H. Schutze H. Automatic word sense discrimination. Computational Linguistics. 1998, 24(1): 97-124
    8. G. Salton, M. Mitra, C. Buckley. Automatic text structuring and summarization. Information Processing and Management. 1997, 2(32): 193-207
    9. K. Papineni, T. Ward, W. Zhu. Bleu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002: 311-318
    10. C. Lin. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. Proceedings of Human Language Technology Conference. 2003: 71-78
    11. M. Lapata. Automatic Evaluation of Text Coherence: Models and Representations. Proceedings of the 19th International Joint Conference on Artificial Intelligence. 2005
    12. R. Mihalcea, C. Strapparava. Corpus-based and Knowledge-based Measures of Text Semantic Similarity. American Association for Artificial Intelligence. 2006: 775-780
    13. G. Salton. Term Weighting Approaches in Automatic Text Retrieval. Readings in Information Retrieval. 1997: 513-523
    14. J. Ko, E. Nyberg. A Probabilistic Framework for Answer Selection in Question Answering. Proceedings of NAACL HLT 2007 2007:524-531
    15. H. Cui, K. Li, M. Kan, T. Chua. Question Answering Passage Retrieval Using Dependency Relations. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 2005:400 - 407
    16. Gallo. Pharmaccokinetics: Model Structure and Transport Systems. Clinical Research and Regulatory Affairs. 2001:235 - 266
    17. Larsen, I. Newman. Good News from MEDLINE--and How You Can Help Us Stay One Step ahead. Nat Clin Pract Endocrinol Metab. 2006, 2(9):471
    18.陆小辉.简论信息检索的相关性.科技文献信息管理. 2006, 1:49-52
    19.孙建军,成颖.信息检索技术.科学出版社. 2004:360-361, 352-360
    20.王家钺.信息检索中相关性概念的研究.现代外语. 2001, 24:181-191
    21. L. Schamber L. A Re-examination of Relevance: Toward a Dynamic, Situational Definition. Information Processing and Management. 1990, 26(6):755-775
    22. L.Schamber. Relevance and Information Behavior. Annual Review of Information Sciencd and Technology. 1994, 29:3-48
    23. T. Saracevic T. Relevance: a Review of and a Framework for the Thinking on the Notion in Information Science. Journal of American Society for Information Science. 1975, 26(6):321-343
    24. Katter. Experimental Studies of Relevance Judgments. System Development Corporation. 1967, 110-132
    25. A. M. Rees. A Field Experimental Approach to the Study of Relevance Assessments in Relation to Document Searching. Final Report, NSF Contract No C-423. 1967, 1
    26. T. Saracevic. The Concept of“Relevance”in Information Science: a Historical Review. Introduction to Information Science. 1970:111-151
    27. Cooper. A Definition of Relevance for Information Retrieval. Information Storage and Retrieval. 1971, 7(1):19-37
    28. Cooper. On Selecting a Measure of Retrieval Effectiveness: The Subjective Philosophy of Evaluation. Journal of the American Society for Information Science. 1973, 24(2):87-100
    29. S. E. MacMullin. Problem Dimensions and Information Traits. The Information Society. 1984, 3(1):91-111
    30. Taylor. Value-Added Processes in Information Systems. Norwood, NJ: AblexPublishing Corporation. 1986: 103-135
    31. N. J. Belkin. ASK for Information Retrieval. Journal of Documentation. 1982, 38(3): 145-164
    32. B. Dervin. An Overview of Sense-Making Research: Concepts, Methods and Results to Date. In: The International Communication Association Annual Meeting. Dallas, TX. 1983: 1-67
    33. P. Ingwersen. Psychological Aspects of Information Retrieval. Social Science Information Studies. 1984, 4:83-95
    34. P. Ingwersen. Cognitive Information Retrieval. Annual Review of Information Science and Technology. 2001, 34:3-52
    35. Simon. Search and Reasoning in Problem Solving. Artificial Intelligence. 1983, 21:7-29
    36. T. Saracevic Relevance Reconsidered. Information science: Integration in perspectives. Proceedings of the Second Conference on Conceptions of Library and Information Science, Copenhagen, Denmark. 1996:201-218
    37. Park. The Nature of Relevance in Information Retrieval: An empitical study Library Quarterly. 1993, 63(3):318-351
    38. C. Barry. User-defined Relevance Criteria: An Exploratory Study. Journal of American Society for Information Science. 1994, 45(3):149-159
    39. C. Barry. Users' Criteria for Relevance Evaluation: a Cross-situationalComparison. Information Processing and Management. 1998, 24:219-236
    40. T. Froehlich. Relevnce Reconcidered-Towards an Agenda for the 21st Century: Introduction to special topic issue on relevance research. Journal of American Society for Information Science. 1994, 45(3):124-133
    41. P. Borlund. The Concept of Relevance in IR. Journal of American Society for Information Science. 2003, 54(10):913-925
    42. S. Mizzaro. How Many Relevances in Information Retrieval? Interacting with Computers. 1998, 10:305-322
    43. S. Mizzaro. Relevance: The Whole History. Journal of the American Society for Information Science. 1997, 48(9):810-832
    44. A. Sayed, D. Zighed. Using Semantic Distance in a Content-Based Heterogeneous Information Retrieval System. MCD. 2007:224-237
    45. A. Sayed, D. Zighed. Mining Semantic Distance Between Corpus Terms. PIKM. 2007:49-54
    46. A. Budanitsky. Semantic Distance in WordNet: An Experimental, Application-Oriented Evaluation of Five Measures. Proc Workshop WordNet and Other Lexical Resources, Second Meeting of the North Am Chapter of the Assoc for Computational Linguistics. 2001:29-34
    47. Y. Li, D. McLean. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transctions on Knowledge and Data Engineering. 2003, 15(4): 871-882
    48. C. Pablo, F. Informatica. Using Syntactic Contexts for Measuring Word Similarity. Proceedings of the workshop " The Acquisition and Representation of Word Meaning", ESSLI'01. 2001
    49.荀恩东,颜伟.基于语义网计算英语词语相似度.情报学报. 2006, 25(1): 43-48
    50. A. Rodriguez. Determining Semantic Similarity among Entity Classes from Different Ontologies. IEEE Transactions on Knowledge and Data Engineering. 2003, 15(2): 442-456
    51. D. Yang. Measuring Semantic Similarity in the Taxonomy of WordNet. Proceedings of the Twenty-eighth Australasian conference on Computer Science, ACM International Conference Proceeding Series. 2005, 102:315-322
    52. P. Turney. Similarity of Semantic Relations. Computational Linguistics. 2006, 32:379 - 416
    53. K. Janowicz. Extending Semantic Similarity Measurement by Thematic Roles. In: First International Conference on GeoSpatial Semantics. Mexico City. Mexico: Springer Verlag. 2005: 137-152
    54. P. Resnik. Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research. 1999, 11: 95-130
    55. S. Patwardhan, T. Pedersen. Using Measures of Semantic Relatedness for Word Sense Disambiguation. Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics. 2003:241-257
    56. P. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001). 2001:491-502
    57. H. Kozima. Computing Lexical Cohesion as a Tool for Text Analysis. Univ. ofElectro-Comm. 1994: 1-39
    58. Green. Building Hypertext Links by Computing Semantic Similarity. IEEE Trans Knowledge and Data Eng. 1999, 11(5):713-730
    59. E. Voorhees E. Using WordNet to Disambiguate Word Senses for Text Retrieval. Proceedings of the 16th annual international ACM SIGIR conference. 1993:171-180
    60. L. Barrington, D. Turnbull. Audio Information Retrieval using Semantic Similarity. ICASSP 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. 2007, 2:725-728
    61. A. E. Sayed, D. Zighed. A Multisource Context-dependent Approach for Semantic Distance between Concepts. Database and Expert Systems Applications. 2007:54-63
    62. A.Islam, I. Kiringa. Applications of Corpus-based Semantic Similarity and Word Segmentation to Database Schema Matching. The VLDB Journal. 2007, 17(5):1293-1320
    63. A. Islam. Semantic Text Similarity Using Corpus-based Word Similarity and String Similarity. ACM Trans Knowl Discov Data. 2008, 2:1-25
    64. K. W. Church. Word Association Norms, Mutual Information, and Lexicography. Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics. 1990:76-83
    65. M. Islam. Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words. In: Proceedings of the International Conference on Language Resources and Evaluation. Genoa, Italy. 2006: 1033--1038
    66.章志凌,虞立群,陈奕秋,罗海飞,邵晓敏.基于Corpus库的词语相似度计算方法.计算机应用. 2006, 26(3):638-640
    67. D. Hindle. Noun Classification from Predicate-argument Structures. Proceeding of the Association for Computational Linguistics. 1990:268-275
    68. Miller. Wordnet: A lexical Database for English. ACM. 1995, 38(11):39-41
    69. J. Ferlez. Shortest-Path Semantic Distance Measure in WordNet v2.0. Informatica. 2004, 28:385–390
    70. A. Budanitsky. Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics. 2006, 32(1):13-47
    71. D. Trieschnigg, P. Pezik, V. Lee, F. de Jong, W. Kraaij, D. Rebholz-Schuhmann. MeSH Up: Effective MeSH Text Classification for Improved Document Retrieval. Bioinformatics. 2009: 1412-1418
    72. Deshazo, D. Lavallie, F. Wolf. Publication trends in the medical informatics literature: 20 years of "Medical Informatics" in MeSH. BMC Med Inform Decis Mak. 2009, 9: 7
    73. S. Nelson. A Multilingual Vocabulary Project - Managing the Maintenance Environment. In: European Association for Health Information & Libraries (EAHIL) Workshop. Poland. 2007
    74. S. Nelson, A. Savage, J. Schulman, N. Arluk. The MeSH Translation Maintenance System: Structure, Interface Design, and Implementation. Proceedings of the 11th World Congress on Medical Informatics. 2004:67-69
    75.董振东,董强,郝长伶.知网的理论发现.中文信息学报. 2007, 21(4):3-9
    76.江敏,肖诗斌,王弘蔚,施水才.一种改进的基于《知网》的词语语义相似度计算.中文信息学报. 2008, 22(05):84-89
    77.许云,樊孝忠,张锋.基于知网的语义相关度计算.北京理工大学学报. 2005, 25(5): 411-414
    78. R. Rada, E. Bicknell, M. Blettner. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics. 1989, 19(1):17-30
    79. C. Leacock, G. A. Miller. Using Corpus Statistics and Wordnet Relations for Sense Identification. Computational Linguistics. 1998, 24(1):147-165
    80. Z. Wu. Verb Semantics and Lexical Selection. In 32nd Annual Meeting of the Association for Computational Linguistics 1994:133 -138
    81. A. Tversky A. Features of similarity. Psychological Review. 1977, 84:327-352
    82. D. Lin. An Information-theoretic Definition of Similarity. In: Proc 15th International Conf on Machine Learning. San Francisco, CA. 1998: 296-304
    83. M. McHale. A Comparison of WordNet and Roget’s Taxonomy for Measuring Semantic Similarity. Proc COLING/ACL Workshop Usage of WordNet in Natural Language Processing Systems. 1998:115-120
    84. G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Boston, MA. Addison-Wesley Publishing Company. 1989
    85. M. Damashek. Gauging Similarity with N-grams: Language-independent categorization of text. Science. 1995, 267: 843-848
    86. S. Deerwester, T. Landauer, G. Furnas, R. Harshman. Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science. 1990, 46(6):391-407
    87. T. Landauer. A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 1997, 104(2): 211-240
    88. R. Price. Application of Latent Semantic Indexing to Processing of Noisy Text, Intelligence and Security Informatics,. Lecture Notes in Computer Science. 2005, 3495: 602-603
    89. K. Gee K. Using Latent Semantic Indexing to Filter Spam. Proceedings ACM Symposium on Applied Computing. 2003:460-464
    90. G. Sturt. Introduction to Matrix Computing. Shanghai Publishing Company. 1980
    91. E. S. Ristad. Learning String-edit Distance. IEEE PAMI. 1998, 20(5):522--532
    92.车万翔,刘挺,秦兵,李生.基于改进编辑距离的中文相似句子检索.高技术通讯. 2004, 14(07): 15-19
    93.梅家驹,竺一鸣,高蕴琦.同义词词林.上海,上海辞书出版社. 1983
    94. J. Carbonell. The Use of MMR, Dirversity-based Reranking for Recording Documents and Producing Summaries. In: Proceedings of ACM-SIGIR’98. Melbourne, Australia. 1998: 335-336
    95.潘谦红,王炬,史忠植.基于属性论的文本相似度计算.计算机学报. 1999, 22(06): 651-655
    96.郭武斌,周宽久,苏振魁.基于词序方法的文本相似度计算模型.情报学报. 2008, 27(6) :857-862
    97.李晓光,于戈,王大玲.基于混合语言模型的文档相似性计算模型.中文信息学报. 2006, 20(4):41-48
    98. P. Lakkaraju, M. Speretta. Document Similarity Based on Concept Tree Distance. Proceedings of the nineteenth ACM conference on Hypertext and hypermedia. 2008:127-132
    99. E. Gabrilovich. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI). Hyderabad, India. 2007: 1606-1611
    100. X. Wang, S. Wu. Challenges in Chinese Text Similarity Research. Information Processing International Symposiums. 2008, 23: 297-302
    101. M. Lee, M. Welsh. An Empirical Evaluation of Models of Text Document Similarity. In: 27th Annual Meeting of the Cognitive Science Society, 2005. Austin, Tx, The Cognitive Science Society, Inc. 2005: 1254-1259
    102. R. Fairthorne. Implications of Test Procedures. Case Western Reserve University Press. 1963
    103. S. Nirenburg. Two Approaches of Matching in Example-Based Machine Translation. In: Proc TMI-93. Kyoto, Japan. 1993: 47-57
    104. N. Chatterjee. A Statistical Approach for Similarity Measurement Between Sentences for EBMT. Proceedings of Symposium on Translation Support Systems(STRANS). 2001: 122-131
    105.穗志方,俞士汶.基于骨架依存树的语句相似度计算模型.中文信息处理国际会议(ICCIP’98).1998: 458～465
    106. G. Salton, C. S. Yang. A Vector Space Model for Automatic Indexing. Communications of the ACM. 1975, 18: 613-620
    107. J. Mostafa, M. Palakal, W. Lam. A Multilevel Approach to Intelligent Information Filtering: Model, System, and Evaluation. ACM Transactions on Information Systems (TOIS). 1997, 15(4): 368-399
    108. J. Wu, S. Wang, D. Pan, K. Yamamoto, Z. Wang. An Improved VSM Based Information Retrieval System and Fuzzy Query Expansion. Lecture Notes in Computer Science. 2005, 3613: 537-546
    109. Z. Chao, G. Juzhong. A New Approach to Email Classification Using Concept Vector Space Model. Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking Symposia. 2008, 3: 162-166
    110. N. Liu, J. Yan, Q. Yang, S. Yan, Z. Chen, F. Bai, W. Ma. Learning Similarity Measures in Non-orthogonal Space. Proceedings of the thirteenth ACM international conference on Information and knowledge management. 2004: 334 - 341
    111. W. Mao. The Phrase-based Vector Space Model for Automatic Retrieval of Free-text Medical Documents. Data & Knowledge Engineering. 2007, 61(1): 76-92
    112. Y. GUAN X. Wang, Q. WANG. Measurement of System Similarity. In: Proceeding of JSCL. Nanjing; 2005
    113.杨启文,蒋静坪,张国宏.遗传算法优化速度的改进.软件学报. 2001, 12(2): 270-275
    114. M. Schmitt M. LotharTheory of Genetic Algorithms II: Models for Genetic Operators Over the String-tensor Representation of Populations and Convergence to Global Optima for Arbitrary Fitness Function Under Scaling. Theoretical Computer Science. 2004, 310:181-231
    115. E. M.Voorhees. Building a Question Answering Test Collection. Proceeding of the 23rd ACM SIGIR conference. 2000: 200-207
    116. H. Fu. Research of Web-Based Open Domain Question Answering System. New technology of library and information service. 2005, 128(9): 36-40
    117.张秀坤,赵丹群. TREC概况及其最新发展研究.情报理论与实践. 2004, 27(5): 537-540
    118. H. Dang, D. Kelly. New technology of library and information service. The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings. 2007
    119. H. T. Dang, J. Lin. Overview of the TREC 2007 Question Answering Track. The Sixteenth Text REtrieval Conference (TREC 2007) Proceedings. 2008
    120. Agirre, Eneko, E. Alfonseca, O. Lacalle. Approximating Hierarchy-Based Similarity for WordNet Nominal Synsets using Topic Signatures. Proceedings of the Second Global WordNet Conference, 2004:15-22
    121. P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence. 1995: 448-453
    122. W. N. Francis. Frequency Analysis of English Usage. Lexicon and Frammar. 1982
    123. E.M. Voorhees.Overview of the TREC 2005 Question Answering Track. The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings. 2006
    124. M. Hasan. Cohesion in English. London: Longman. 1976: 3-11
    125. M. Hoey. Patterns of Lexis in Text. Oxford, Oxford University Press. 1991: 5-27
    126. J. Morris. Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of the Text. Computational Linguistics. 1991, 17(1):21-48
    127. M. Brunn, B. Dufour. The University of Lethbridge Text Summarizer at DUC 2002. Proceedings of the Text Summarization Workshop and 2003 Document Understanding Conference. 2002: 39-44
    128. K. Han, H. Rim. The K.U.Leuven Summarization System DUC-2003. Proceedings of Workshop on Text Summarization (DUC 2003). 2003:
    129. Y. Chali. Word Sense Disambiguation Using Lexical Cohesion. Proceedings of the 4th International Conference on Semantic Evaluations. 2007: 476-479
    130. Y. Chali. Generic and Query-Based Text Summarization Using Lexical Cohesion. Proceedings of the Fifteenth Canadian Conference on Artificial Intelligence. 2002:293-303
    131. E. Teich. Exploring Lexical Patterns in Text: Lexical Cohesion Analysis withWordNet. Interdisciplinary Studies on Information Structure. 2005, 2:129–145
    132.尤文建,李绍滋,李堂秋.基于词汇链的文本过滤模型.计算机应用研究. 2003, 9: 32-35
    133.黄利辉.文本挖掘在生物学中的应用.医学信息学杂志. 2006, 27(3):161-163
    134. R. Woosley. Drug development and the FDA’s critical path initiative. ClinicalPharmacology and Therapeutics. 2007, 81:129-133
    135. Andrea. A Polymorphism in the VKORC1 Gene is Associated with an Interindividual Variability in the Dose-anticoagulant Effect of Warfarin. Blood. 2005, 105:645-649
    136. J. Kirchheiner. Clinical Consequences of Cytochrome P450 2C9 polymorphisms. Clin Pharmacol Ther. 2005, 77:1-16
    137. Badagnani. Interaction of Methotrexate with Organic-anion Transporting Polypeptide 1A2 and its Genetic Variants. J Pharmacol Exp Ther. 2006, 318:521-529
    138. Hung. Genetic Susceptibility to Carbamazepine-induced Cutaneous Adverse Drug Reactions . Pharmacogenet Genom. 2006, 16:297-306
    139. Innovation or stagnation: challenges and opportunity on the critical path to new medical products. www.fda.gov/oc/initiatives/criticalpath/whitepaper.html
    140. Food and Drug Administration. Critical Path Opportunity List www.fda.gov/oc/initiatives/criticalpath/reports/opp_list.pdf
    141. Food and Drug Administration. Critical Path Opportunity Report. 2006
    142. Lalonde, Kowalski, Hutmacher, Ewy, Nichols, Milligan, Corrigan, Lockwood, Marshall, Benincosa. Model-based drug development. Clin Pharmacol Ther 2007, 82(1):21-32
    143. M. Chang, S. Kenley, J. Bull, Y. Chiu, W. Wang, C. Wakeford, K. McCarthy: Innovative Approaches in Drug Development. J Biopharm Stat. 2007, 17(5):775-789
    144. O'Neill. FDA's Critical Path Initiative: a Perspective on Contributions of Biostatistics. Biom J. 2006, 48(4):559-564
    145. Chien, S. Friedrich, M. Heathman, D. Alwis, V. Sinha. Pharmacokinetics/ Pharmacodynamics and the Stages of Drug Development: Role of Modeling and Simulation. AAPS J. 2005, 7(3):E544-559
    146. Drug Interaction Database. www.druginteractioninfo.org
    147. J. Pustejovsky, J. Castano, J. Zhang, M. Kotecki, B. Cochran. Robust relational Parsing over Biomedical Literature: Extracting Inhibit Relations. Pac Symp Biocomput. 2002:362-373
    148.D. Thompsom. Tracking the Growth of Drug Therapy Literature Using Pubmed. Drug Information Journal. 2007, 41:449-455
    149. H. Shatkay, R. Feldman. Mining the Biomedical Literature in the Genomic Era: an Overview. J Comput Biol. 2003, 10(6):821-855
    150. L. Jensen, J. Saric, P. Bork. Literature Mining for the Biologist: from Information Retrieval to Biological Discovery. Nat Rev Genet. 2006, 7(2):119-129

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700