单类中心学习及其在二元关系抽取中的应用

英文题名：The Learning of Single Class Center and the Application in Binary Relation Extraction
作者：李志圣
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：self-training ; 单类中心学习 ; 二元关系抽取 ; 高斯混合模型 ; 问答模式学习 ; 术语模式学习
英文关键词：self-training ; the learning of centre of single class ; the extraction of binary relation ; Gaussian mixture model ; learning of Q-A patterns ; learning of terminology patterns
学位年度：2008
导师：何丕廉
学科代码：081203
学位授予单位：天津大学
论文提交日期：2008-05-01

摘要

在互联网上进行二元关系抽取,是当前信息抽取的重要研究方向。为利用互联网的大量未标定语料,许多文献提出了基于self-training机制的学习方法:即在小标注集上训练初始系统,然后在系统运行过程中,自动标定可靠候选,重新训练,以改进系统性能。
     实践证明:上述方法在二元关系抽取中是行之有效的,但已有文献缺乏对学习过程的理论分析。
     本文首先将在二元关系抽取中的模式学习问题转化为单类文本中心的学习问题。在文本向量空间中,当初始中心被给定后,可将其足够小邻域内的文本向量作为自动标定数据。本文要解决的核心问题是:当数据集具有何种特性时,利用自动标定数据能确定地改进对单类中心的学习?
     为解决该问题,本文研究文本向量空间的分布特性。为克服高斯混合模型在描述具有硬聚类特性的数据分布时的缺点,本文提出了基于k-means算法划分区域的TGMK模型,并揭示了TGMK模型与k-means算法、高斯混合模型的密切联系。实验结果表明:TGMK模型适合描述多类文本数据。
     本文在k-means算法基础上提出了single-mean算法。文中证明:当多类数据集适合被1-TGMK的泛化模型—1-TGMR模型所描述时,新算法从目标类的初始中心出发,将收敛到实际中心。至此,完成了对核心问题的解答。实验表明了新算法在文本数据上的有效性,从而说明了self-training机制在二元关系抽取中的有效性。
     本文为二元关系抽取工作建立了基于single-mean算法的形式化学习模型,并针对在互联网上进行二元关系抽取的特殊性,提出了新的候选评分方法和自动标定方法。
     本文将学习模型应用到中文问答对和中英文术语对的抽取中。与前人工作不同的是:本文将self-training机制引入中文问答模式和中英文术语模式的学习中,使得系统对人工标定语料的依赖度减到最小;本文利用启发规则,改进模式和候选的评分方法。实验表明:与同类系统相比,新系统能在更小的标注集上,实现更优的性能。
To extract the binary relation from web is an important research direction in the field of information extraction. Many literatures had presented learning methods based on self-training mechanism. In these methods, an initial system is trained on a small labeled data set. Then the system labels the reliable candidate data to re-train itself for better performance.
     These Literatures show that the above methods are efficient in extraction of binary relation. But no literature tries to analyze the methods strictly.
     This paper transforms the pattern learning in the extraction of binary relation into the learning of centre of single text class. In text vector space, the vectors in the small neighborhood region of the initial centre could be labeled as the reliable data. This paper aims to answer the key problem: what nature the data set should owns so that the self-labeled data can definitely improve the learning of single class centre.
     This paper solves the key problem through the study on the nature of text vector space. For conquer the defects in the description for distribution of“hard”data set by Gaussian mixture model, this paper presents a new model: TGMK model based on the partitions acquired by k-means algorithm, and exposes the relations among the TGMK model, k-means algorithm and Gaussian mixture model. The experiment result shows that TGMK model is suitable as the description for the text data set of multiple classes.
     Based on k-means algorithm, this paper presents a new algorithm: single-mean algorithm. This paper proves that if the data set of multiple classes is suitable to be described by 1-TGMR model which is the generalization version of TGMK model, the output centre of single-mean algorithm will definitely converge to the actual centre of data set from the initial centre. The above researches solve the key problem perfectly. The experiment shows that single-mean algorithm is efficient on the text data set of multiple classes, which also shows that the learning method based on self-training mechanism is efficient in the extraction of binary relation.
     This paper creates a formal learning model for the extraction of binary relation based on single-mean algorithm, and presents a new score method for candidates and a new self-labeled method against the particularity of the extraction of binary relation from web.
     This paper uses the formal model to acquire Chinese Q-A patterns and Chinese-English terminology pairs. Differently with the previous work, this paper presents new methods to learn Chinese Q-A patterns and Chinese-English terminology patterns based on self-training mechanism, which reduces the dependence of labeled data set to the maximum extent. This paper also utilizes the heuristic rules to improve the scoring methods for the patterns and candidates. The experiment results show that compared with the same kind of systems, our systems have better performances on smaller labeled data set.

引文

[1]李保利,陈玉忠,俞士汶,信息抽取研究综述,计算机工程与应用,2003,10(1):1~6
    [2]赵一唯,王和珍,李振东,WWW信息检索综述,南京大学学报(自然科学版),2001,37(2):192~198
    [3]董静,孙乐,冯元勇,黄瑞红,中文实体关系抽取中的特征选择研究,中文信息学报,2007,21(4):80~86
    [4] ACE,The nist ace evaluation website,www.nist.gov/speech/tests/ace/ace07,2007
    [5]陈英,徐罡,顾国昌,一种本体和上下文知识集成化的数据挖掘方法,软件学报,2007,18(10):2507~2515
    [6]李维刚,刘挺,李生.基于网络挖掘的实体关系元组自动获取[J].电子学报. 2007. 11(35):2110-2115
    [7] C. Aone, M. Ramos Santacruz, Rees: A large-scale relation and event extraction system, In the Proceedings of the 6th Applied Natural Language Processing Conference, 2000, 76~83
    [8] Liu Kebin, Li Fang, Liu Lei, Han Ying, Implementation of a Kernel-Based Chinese Relation Extraction System, Journal of Computer Research and Development, 2007, 44(8): 1406~1411
    [9] Yuh-Jye Lee, Su-Yun Huang, Reduced Support Vector Machines: A Statistical Theory, IEEE Transactions on Neural Networks, 2007, 18(1):1~13
    [10] T. Zhang, Regularized winnow methods, In Advances in Neural Information Processing Systems, 2001, 703~709
    [11]车万翔,刘挺,李生,实体关系自动抽取[J],中文信息学报,2005,2(19):1~6
    [12] Miller S., Fox H., Ramshaw L., A novel use of statistical parsing to extract information from text, In Proceedings of 6th Applied Natural Language Processing Conference, 2000
    [13] Che Wanxiang, Jiang Jianmin, Su Zhong, Improved-Edit-Distance Kernel for Chinese Relation Extraction, In Proceedings of the Second International JointConference on Natural Language Processing, 2005, 132~137
    [14] N. Cristianini, J. Shawe Taylor, H. Lodhil, Latent semantic kernels, Journal of Intelligent Information Systems, 2002, 18(2-3):127~152
    [15] Blanchard G.. O., Bousquet L., Zwald. Statistical Properties of Kernel Principal Component Analysis, Machine Learning, 2006, 66(2-3), 259~294
    [16] A. Culotta, J. Sorensen, Dependency tree kernelsfor relation extraction, In Proceedings of The 42nd Meeting of Association for Computational Linguistics, 2004
    [17] J. Suzuki, T. Hirao, Y. Sasaki, Hierarchical directed acyclic graph kernel: Methodsfor structured natural language data, In Proceedings of The 41st Annual Meeting of the Associationfor Computational Linguistics ACL2003, 2003
    [18] H. Lodhi, C. Saunders, J. Shawe Taylor, Text classification using string kernels, Journal of Machine Learning Research, 2002, 2: 419~444
    [19] Brin S, Extracting Patterns and Relations from the World Wide Web,In Proceedings of WebDb Workshop at Edbt98, 1998
    [20] Eugene Agichtein, Luis Gravano, Snowball: Extracting Relations From Large Plain-Text Collections, In Proceedings of the 5th ACM International Conference on Digital Libraries(DL), 2000
    [21]姜吉发,王树西,一种自举的二元关系和二元关系模式获取方法,中文信息学报,2005,19(2):71~77.
    [22] Tomita J., Soderland S., Etzioni O., Expanding the Recall of Relation Extraction by Bootstrapping. In Proceedings of EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM), 2006
    [23] Ellen Riloff, Rosie Jones, Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, In Proceedings of the 6th AAAI-99, 1999
    [24] Michael Thelen and Ellen Riloff, A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts, In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002
    [25] Zhu X., Semi-Supervised learning literature survey, Report of Dept of Computer Sciences, University of Wisconsin-Madison, 2006
    [26] Yarowsky D., Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995, 189~196
    [27] Maeireizo B., Litman D., Hwa R., Co-training for predicting emotions with spoken dialogue data, In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), 2004
    [28] Rosenberg C., Hebert, M., Schneiderman, H, Semi-supervised selftraining of object detection models, Seventh IEEE Workshop on Applications of Computer Vision, 2005
    [29] McClosky D., Charniak E., Johnson M., Effective self-training for parsing, In Proceedings of ACL (NAACL), 2006
    [30] Fragouidis D., Likothanassis S. D., Retriever: a self-training agent for intelligent information discovery. In Proceedings of Information Intelligent and Systems International Conference, 1999, 594 ~599
    [31]曾华军,张银奎,机器学习,北京:机械工业出版社,2003
    [32] Roberts S., Tarassenko L., Pardey, J. A., validation index for artificial neural networks. In Proceedings of Int. Conference on Neural Networks and Expert Systems in Medicine and Healthcare, 1994, 23~30
    [33] Koch M., Moya M., Hostetler L., Cueing feature discovery and one-class learning for synthetic aperture radar automatic target recognition, Neural Networks. 1995, 8(7):1081~1102
    [34] D.M.J. Tax, One-class classification: Concept-learning in the absence of counter-examples:[Ph.D. thesis], Delft University of Technology, 2001, 1~190
    [35] Boucheron S.., Bousquet O., Lugosi G.., Theory of classification: A survey of some recent advances, ESAIM: Probability and Statistics, 2005, 9:323~375
    [36]史忠值,知识发现,北京:清华大学出版社,2002
    [37] Fukanaga K., Statistical pattern recognition, Academic press, 1990
    [38] T.M Cover, P.E Hart, Nearest Neighbor Pattem Classifieation, IEEE Transaetions on Information Theory, 1967, 13(1):21~27
    [39] T.Mitehell, Machine Learning, NewYork:McGraw-Hill, 1997
    [40] V.VaPnik, TheNature of Statistical Leaming Theory, NewYork: Springer-Verlag, 1995
    [41] R. Adwait, Maximum Entropy Models for Natural Language Ambiguity Resolution:[PhD Dissertation], University of Pennsylvania, 1998
    [42] Breiman L., Friedman J H, Olshen R A, Classification and regression trees, Belmont, California:Wadsworth International Group, l984
    [43] Li C. H., Park S. C., Text Categorization Based on Artificial Neural Networks,LECTURE NOTES IN COMPUTER SCIENCE, 2006, 42(34):302~311
    [44] Moya M., Koch M., Hostetler, L., One-class classifier networks for target recognition applications, In Proceedings of world congress on neural networks, Portland, 1993, 797–801
    [45] Ritter G.., Gallegos M., Outliers in statistical pattern recognition and an application to automatic chromosome classification, Pattern Recognition Letters, 1997, 18:525~539
    [46] Bishop C., Neural Networks for Pattern Recognition, Oxford University Press, 1995
    [47] Japkowicz N., Concept-Learning in the absence of counterexamples: an autoassociation-based approach to classification:[PhD thesis]. New Brunswick Rutgers, The State University of New Jersey, 1999
    [48] Japkowicz N, Myers C, Gluck M, A novelty detection approach to classification. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995, 518~523
    [49] Scholkopf B., Platt J Shawe Taylor J, Estmating the support of a high dimensional distribution. NeuralComputation, 2001, 13(7):1443~1471
    [50] Tax D. M. J., Duin R. P. W., Support vector data description, Machine Learning, 2004, 54(1): 45~66
    [51] Takashi Onoda, Hiroshi Murata, Seiji Yamada, Non-Relevance Feedback Document Retrieval Based on One Class SVM and SVDD, In Proceedings of IEEE World Congress on Computational Intelligence, 2006, 2191~2198
    [52] Jian Li, Yizhuo Zhang, Liping Sun, SVDD-based Mechanical Fault Diagnosis for Fiberboard Gluing System, In Proceedings of International Conference on Manufacturing Automation, Singapore, 2007
    [53] Sang-Woong Lee, Seong-Whan Lee, SVDD-Based Illumination Compensation for Face Recognition, In Proceedings of ICB, 2007, 154~162
    [54] Woo–Sung Kang and Jin Young Choi, SVDD-based method for Face Recognition System, In the Proceedings of ISIS, Tokyo, 2006
    [55] Lei Guo, Huaitie Xiao, Qiang Fu, Fuzzy recognition method for radar target based on KPCA and SVDD, In the Proceedings of MIPPR, 2007
    [56] Kunlun Li, Huang houkuan and Shengfeng Tian, Improving One-Class SVM for Anomaly Detection, In Proceedings of International Conference on Machine Learning and Cybernetics, 2003, 3077~3081
    [57] Roberto Perdisci, Guofei Gu, Wenke Lee, Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems, ICDM 2006: 488-498
    [58] Rabaoui A., Ellouze N., Improved one-class SVM classifier for sounds classification, In the Proceedings of Advanced Video and Signal Based Surveillance( AVSS), 2007
    [59] Lewis Barnett V., Lewis T., Outliers in statistical data, Wiley series in probability and mathematical statistics. John Wiley & Sons Ltd., 2nd edition, 1978
    [60] Tarassenko L., Hayton P., Brady M., Novelty detection for the identification of masses in mammograms, In Proceedings of the Fourth International IEE Conference on Artificial Neural Networks, 1995, 409: 442~447
    [61] Parzen, E., On estimation of a probability density function and mode, Annals of Mathenatical Statistics, 1962, 33:1065~1076
    [62] Parra L., Deco G., Miesbach S., Statistical independence and novelty detection with information preserving nonlinear maps, Neural Computation, 1996, 8:260~269
    [63] Ritter G., Gallegos M., Outliers in statistical pattern recognition and an application to automatic chromosome classification, Pattern Recognition Letters, 1997, 18:525~539
    [64] Moya M., Hush D., Network contraints and multiobjective optimization for one-class classification, Neural Networks, 1996, 9(3):463~474
    [65] Moya M, Koch M, Hostetler L, One-class classifier networks for target recognition applications, In Proceedings of world congress on neural networks, Portland, 1993, 797–801
    [66] Castells, P., Fernández, M., and Vallet D., An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval, IEEE Transactions on Knowledge and Data Engineering, 2007, 19(2): 261~272
    [67] Christopher D., Manning, Hinrich Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999
    [68]王海涌,郑丽英,刘丽艳,基于文本表示的特征项权值确定方法研究,甘肃科学学报,2005,17(3):86~89
    [69] Andrew McCallum, Kamal Nigam, A comparison of event models for naive Bayes text classi_cation, In Proceedings of AAAI-98 Work-shop on Learning for Text Categorization, 1998
    [70] D. Steinley, K-means clustering: A half-century synthesis, British Journal of Mathematical & Statistical Psychology, 2006, 59:1~34
    [71]盛骤,谢式千,潘承毅,概率论与数理统计[M],高等教育出版社,1994
    [72]索红光,王玉伟,一种用于文本聚类的改进k-means算法,山东大学学报,2008,43(1):60~64
    [73] P. Mitra, S K Siddiqi, Non-convex clustering using expectation maximization algorithm with rough set initialization, Pattern Recogn Lett., 2003, 24:863~873
    [74] Bradley P., Fayyad U., Refining initial points for k-means clustering, In Proceedings of 15th Internat Conference on Machine Learning, Morgan Kaufmann, 1998
    [75] S. Zhong, J. Ghosh. A comparative study of generative models for document clustering, In Proceedings of SDW workshop on clustering high dimensional data and its applications, 2003
    [76] Richard O Duda, Peter E Hart, David G. Stork, Patterns Classification, Second Edition, John Wiley& Sons, Inc., 2001
    [77] M. J. Kearns, Y Mansour A Ng, An information-theoretic analysis of hard and soft assignment methods for clustering, In Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, 1997
    [78] H. J. Zeng, X. H. Wang, Z. Chen, CBC: Clustering based text classification requiring minimal labeled data, In Proceedings of the third IEEE International Conference on Data Mining, 2003
    [79] Hartman E., J. Keeler, J. Kowalski, Layered neural networks with gaussian hidden units as universal approximations, Neural Computation, 1990, 210~215
    [80] McLachlan G, K Basford, Mixture Models: Inference and Applications to Clustering, Series on Statistics: Textbooks and Monographs, Marcel Dekker, New York, 1987, 37~70
    [81]黄国宏,刘刚,一种新的基于高斯混合模型的线性判别分析,计算机工程与应用,2007,43(27):75~77
    [82] Lei Xu, Bayesian Ying-Yang Machine - Clustering and Number of Clusters, Pattern Recognition Letters, 1997, 18(13):1167~1178
    [83] Y. Cheung, k*-means: A new generalized k-means clustering algorithm, Pattern Recognition Letters, 2003, 2883~2893
    [84] Xu, L., How many clusters: A YING-YANG machine based theory for a classical open problem in pattern recognition, Pattern Recognition Letters, 1996.3:1546~1551
    [85] M.A. Jinwen, Automated Model Selection (AMS) on Finite Mixture: a New Perspective for Data Modeling, CHINESE JOURNAL OF ENGINEERING MATHMATICS, 2007, 24(4):571~584
    [86]杨昆,吴东旭,截尾正态分布X控制图模拟,中国管理信息化,2007, 10(3):51~52
    [87]杨坚,罗四维,刘蕴辉,一种基于广义KL距离和几何曲率的模型选择准则,电子学报,2005,12(33):2271~2277
    [88] David Lewis, Reuters Text Categorization Test Collection, http://www.daviddlewis.com/resources/testcollections/reuters21578/, 2005
    [89]谭松波,王月粉,中文文本分类语料库, http://lcc.ict.ac.cn/-tansongbo/corpus1.php,2005
    [90] Yang Yiming, Pederson J O., A Comparative Study on Feature Selection in Text Categorization, In Proceedings of the 14th International Conference on Machine learning, Nashville: Morgan Kaufmann, 1997, 412~420
    [91]范劲松,特征选择和提取要素的分析及其评价,计算机工程与应用,2001, 13:95~99
    [92] Mlademnic, D., Grobelnik. M., Feature Selection for unbalanced class distribution and Na?ve Bayees, In Proceedings of the Sixteenth International Conference on Machine Learning, Bled: Morgan Kaufmann, 1999, 258~267
    [93]周茜,赵明生,中文文本分类中的特征选择研究,中文信息学报,2004, 18(3):17~23
    [94]王梦云,曹素青,基于字频向量的中文文本自动分类系统,情报学报,2000, 19(6):644~649
    [95]杨健,杨静宇,叶晖,Fisher线性鉴别分析的理论研究及其应用,自动化学报,2003,29(4):481~493
    [96] Fukunaga K., Hostetler L. D., The estimation of the gradient of fl density function with applications in pattern recognition, IEEE Trans on Information Theory, 1975, 21(1):32~40
    [97] Cheng Y., Mean shift, mode seeking and clustering, IEEE Trans. on Pattern Analysis and Machine Intelligence, 1995, 17(8):790~799
    [98]吴友政,赵军,徐波,基于无监督学习的问答模式抽取技术,中文信息学报,2007,21(2):69~76
    [99] M M Soubbotin, S M Soubbotin, Use of Patterns for Detection of Likely AnswerStrings: A Systematic Approach, In Proceedings of the Eleventh Text Retrieval Conference, Gaithersburg, Maryland, 2002
    [100]Deepak Ravichandran, Eduard Hovy, Learning Surface Text Patterns for a Question Answering, In Proceeding of the ACL2002 Conference, Philadelphia, 2002
    [101]Dell Zhang, Wee Sun Lee, Web Based Pattern Mining and Matching Approach to Question Answering, In Proceedings of TREC211, Gaithersburg, 2002
    [102]Dmitri Roussinov, Jose Antonio Robles-Flores, Web question answering through automatically learned patterns, In Proceedings of JCDL2004, 2004, 347-348
    [103]郑逢斌,陈志国,姜保庆,乔保军,语义校对系统中的句子语义骨架模糊匹配算法,电子学报,2003,31(8):1138~1140
    [104]J. Ponte, W. Bruce Croft, A Language Modeling Approach to Information Retrieval. In the Proceedings of ACM SIGIR 1998, 1998, 275~281
    [105]Hutchins W, Somers H., An Introduction to Machine Translation, Academic Press, 1992
    [106]Y. Cao, H. Li. Base Noun Phrase Translation Using Web Data and the EM Algorithm, In Proc. of COLING. 2002, 2002, 127~133
    [107]Nagata M, Saito T, Suzuki. K., Using the Web as a bilingual dictionary, In Proc. of ACL 2001 DD-MT Workshop, 2001, 95~102
    [108]Jian-Cheng Wu, Tracy Lin, Jason S Chang, Learning Source-Target Surface Patterns for Web-Based Terminology Translation. Proceedings of the ACL Interactive Poster and Demonstration Sessions, 2005, 37~40

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700