中文产品评论挖掘关键技术研究

英文题名：Research on Key Mining Techniques of Product Reviews in Chinese
作者：黄永文
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：产品评论挖掘 ; 半监督学习 ; 支持向量机 ; 短文本分类 ; 序列优化
英文关键词：Product Reviews Mining ; Semi-Supervised Learning ; Support Vector Machine ; Short Document Classification ; Sequence Optimization
学位年度：2009
导师：何中市
学科代码：081202
学位授予单位：重庆大学
论文提交日期：2009-04-01

摘要

随着网络的蓬勃发展,以用户为中心反映了用户使用体验、包含了用户对产品的特征、功能和性能等看法的产品评论越来越多。通过参考产品使用者所发表的产品评论,用户可以挑选最适合自己的产品,厂家也可据此对产品进行改进,从而增强企业竞争力,因此产品评论挖掘技术的研究也就变得越来越重要。本文应用机器学习方法对产品评论挖掘的相关技术如短文本分类技术、特征观点对的挖掘方法、特征观点对的优化算法及产品特征的层次关系提取技术进行了研究。取得的主要成果和创新工作概括如下:
     提出基于语义特征的产品评论分类方法。产品评论的自动分类可以获取更好的研究素材,降低评论挖掘算法的复杂性,从而提高挖掘效率。基于产品评论普遍较短,本文从短文本的角度处理产品评论的分类问题。首先对从网上获取的产品评论进行人工标注,获得训练集;然后提取出产品评论中位于前列的χ2统计量和语义内容(产品特征、观点词、程度词)作为分类特征,把语义内容的数量、未挑选的语义内容和评论文本长度也加入分类特征;再使用二分类具有优势的支持向量机分类方法对所获取的分类特征进行学习,获得分类器;最后对网上时时更新的产品评论进行分类,挖掘出优秀的评论,建立评论语料库。实验表明,语义内容的加入对产品评论分类效果的改善是很明显的,准确率提升了9%,达到了80%,对属于短文本类型的产品评论来说分类效果是很不错的。
     采用半监督学习思想,提出在产品评论挖掘过程中把特征挖掘和观点挖掘相结合以获取特征观点对的方法。针对产品特征和观点词具有对应的修饰关系,本文使用半监督学习方法,把用户发表的产品部件、功能、性能等特征和表达了情感的观点词结合在一起进行挖掘,从而保留特征和观点的对应关系。半监督学习方法既可以利用少量标注样本获得专家的标注知识,又可以利用大量未标注数据来改善学习性能,增强学习算法的泛化能力。因此本文把人工定义的少量特征观点对作为种子,结合评论语句中的词、词性和修饰关系等组成的模式特征集对评论库进行挖掘,获取用户真正感兴趣的产品特征和评价。然后使用获得的产品特征词和观点词对多特征的评论进行了处理,实验表明这种处理使准确率和召回率都提升了2%左右。虽然把特征与观点结合在一起进行挖掘的准确率不是很高,但较高的召回率可使半监督学习算法能够挖掘到新的信息。
     为了改善挖掘结果的性能,提出基于最大化调和平均数(Maximize Harmonic-Mean,MHM)的原则,对观点序列进行优化的方法。针对半监督学习方法具有准确率随着迭代次数的增加而急剧下降的缺点,本文在准确率不高、获取的特征观点对中有很多错误的情况下,利用调和平均数易受极端值的影响,尤其受极小值的影响比受极大值的影响更大的特点,对标准差大的观点序列进行调整,删除序列中的低频元素时,通过最大化调和平均数在确保召回率的同时提高准确率。实验结果显示在准确率上升17%的情况下,召回率只降低了5%,此时准确率达到77.3%。
     提出从产品说明书和编辑评测中获取产品特征层次关系的方法,该方法采用结构化挖掘方法对产品说明书挖掘得到规格特征及其层次关系,使用半监督学习方法对编辑评测挖掘获得描述特征及其层次关系。现有的评论挖掘系统在获得特征及对应的观点词后没有对上下位的特征、同一特征的不同词语表达进一步处理,这样就会把同一个特征的不同词语表示作为不同的特征、上下位的特征作为平行特征展现给用户。本文首先使用结构化数据挖掘方法对厂家的产品说明书进行挖掘,获取规格特征之间的层次关系,再利用半监督学习方法对网站所提供的编辑评测进行挖掘,获取描述特征及其层次关系。然后把一段中获取的描述特征与规格特征进行相似度比较,从而获得规格特征和描述特征之间的层次关系。
     本文最后把获取的特征观点对与特征之间的层次关系相连接,合并相同特征的不同表示,对上下位的特征进行归类,统计出各个特征所获得的观点,并以树状的形式从上至下展现整个产品不同层次特征所获得的评价。
With the vigorous development of the network, the product reviews with the customer experience, reflecting their opinions on the product features, functions and properties has more and more on the web. By the reference to the product reviews, customers can buy their most suitable products, manufacturers can improve their products and increase their competitiveness. Therefore, the study of product reviews mining becomes more and more important. In this paper, machine learning techniques are applied in the product reviews mining, such as the technique of short texts classification, the mining method of the feature-opinion pairs, the optimization algorithm of the feature-opinion pairs, and the extraction technology of the hierarchical relationships among the products features. The main contributions of this thesis are summarized as follows.
     The product reviews classification method which basing on the semantic features is proposed. The automatic classification of product reviews can provide a better research material to reduce the complexity of the algorithm for reviews mining, thus to improve the mining efficiency. In this paper, the classification of the product reviews is processed from the angle of short text. First, the product reviews obtained from the web are manual labeled to get the training set. Then the forefront of product reviewsχ2 statistics and semantic contents (product features, opinion words, degree words) are extracted as classification features, and the quantity of the semantic information, the semantic contents those are not selected and the length of the text are also added as classification features. Then the binary classification of support vector machine (SVM) method is used to learn the extracted classification features to obtain the classifier. Finally, the constantly updated products reviews online are classified, and the good reviews are extracted to establish reviews corpora. Experiments show that the classification results of product reviews improve obviously with the adding of semantic content. The precision improved 9 percent and attains to 80 percent. The classification effect is very good for product reviews those belong to short text.
     A Semi-Supervised Learning method is adopted in product reviews mining, and the mining of features and the mining of opinion words are combined in a unified process to get feature-opinion pairs. As there are corresponding modifying relations between the features and the opinions, the features such as the product component, function and performance and the opinion words which expressed the customer emotions are extracted together with the semi-supervised learning method in this thesis, hence retain the corresponding relations between the customer opinion words and the product features. A Semi-Supervised Learning method can be used not only to obtain expert knowledge from the labeled corpus, but also to enhance the performance of learning algorithm generalization ability from the un-labeled data. Therefore, a hand of defined feature-opinion pairs are as seeds, while the words, the part of speech and the modified relations are taking as a pattern feature set to mine the product features and evaluation in which the customers are really interested. Then the evaluations with multi-features but single-opinion are processed with the obtained product features and opinion words, Experimental results show that both the precision and the recall rate improved 2 percent after such processing. Although the precision is not high when features and opinion words are mined in a unify process, the high recall can help the semi-supervised learning algorithm to mine new information.
     The sequences of opinions are optimized with Maximize Harmonic-Mean (MHM) to improve the mining performance. For the accuracy of a semi-supervised learning method will decrease sharply with the iteration, and the Harmonic-Mean is easily influenced by extremum, especially the minimum, the sequences with big standard deviation are adjusted with MHM to delete the low-frequency elements in the sequences, hence ensure the recall and improve the accuracy. Experimental results show that precision is at 77.3 percent. When it improves 17 percent, the recall rate reduces only 5 percent.
     The extraction of the hierarchical relationships of features is proposed. The hierarchical relationships of the specification features are extracted from the product specification files with the structured data mining method, and Bootstrapping method is used to extract the hierarchical relationships of the describing features from editor evaluations. After identifying the features and the corresponding opinion words, the existing reviews mining system didn’t further process the features in different expressions and the features with subordinate relationship, so the same features in different phrases may be shown as different features, and the features with subordinate relationship may be shown as parallel features. In this thesis, structured data mining method is used in the mining of manufacturer product specifications to get the specification features and their hierarchical relationships, then a semi-supervised learning method is used in the mining of the editor evaluation on the web site to get the describing features and their hierarchical relationships. Then the similarity between the specification features and the describing features that extracted from a paragraph is compared to get their hierarchical relationships.
     Finally, the extracted feature-opinion pairs are connected with the hierarchical relationships among the features. Then the same feature in different expressions is merged, and the features with subordinate relationship are put together. Finally, the opinions of every feature are counted, and the product features in different levels are shown from top to bottom in a tree form.

引文

[1]邵峰晶,于忠清.数据挖掘——原理与算法[M] .北京:中国水利水电出版社, 2003 .
    [2] Jiawei Han, Micheline Kambr. Data Mining: Concepts and Techniques[M]. Morgan Kaufmann Publishers, 2002.
    [3]张云鹏.基于Web的数据挖掘技术研究[D].中国石油大学.2006,9.
    [4]伍星,何中市,黄永文.产品评论挖掘研究综述[J].计算机工程与应用.2008,36. 34-41.
    [5]韩家炜,孟小峰,王静.Web挖掘研究[J].计算机研究与发展.2001.4:405-414.
    [6] Appelt DE,Israel DJ. Introduction to Information Extraction Technology[J]. A Communications,1999,12(3):161-172.
    [7]薛为民,陆玉昌.文本挖掘技术研究[J].北京联合大学学报.2005,4:59-63.
    [8]易高翔,程耕国.Web文本挖掘研究[J].武汉科技大学学报(自然科学版).2005,(l):72-74.
    [9]王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展.2000,37(5):513-520.
    [10] K wan-Hsi Chen. Extraction of Product Feature and Opinion from Customer Reviews and Product Overview Pages[D].National Cheng-Kung University.2007,7.
    [11] Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, Toshikazu Fukushima. Mining Product Reputations on the Web[C]. Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, Alberta, Canada. 2002: 341-349.
    [12] Mingqing Hu, Bing Liu. Mining Opinion Features in Customer Reviews[C]. Proceedings of 19th National Conference on Artificial Intelligence (AAAI-2004), San Jose, USA, July 2004.
    [13] Bing Liu, Minqing Hu, and Junsheng Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web[C]. Proceedings of the 14th International World Wide Web conference.0 WWW-2005: 342-351.
    [14] Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto, Collecting Evaluative Expressions for Opinion Extraction[C]. IJCNLP 2004: 596-605.
    [15]姚天昉,聂青阳,李建超等.一个用于汉语汽车评论的意见挖掘系统[C].中国中文信息学会成立二十五周年学术年会论文集.清华大学出版社.2006.
    [16]姚天昉,程希文,徐飞玉,等.文本意见挖掘综述[J].中文信息学报.2008,3:71-80.
    [17] Li Zhuang, Feng Jing, Xiao-Yan Zhu: Movie Review Mining and Summarization[C]. CIKM 2006: 43-50.
    [18] Agrawal, R. and Srikant, R. FastAlgorithm for Mining Association Rules[C]. VLDB’94, 1994.
    [19] Bing Liu, Wynne Hsu, Yiming Ma. Integrating Classification and Association Rule Mining[C]. KDD-98, 1998.
    [20] Popescu, A., and Etzioni, O. Extracting Product Features and Opinions from Reviews[C]. EMNLP’05.
    [21] Etzioni, M. Cafarella, D. Downey, S. Kok, et.al. Unsupervised Named-Entity Extraction from the Web: An experimental study[J]. Artificial Intelligence, 2005,165(1):91–134.
    [22] D. Lin. Dependency-based evaluation of MINIPAR[M]. In Workshop on Evaluation of Parsing Systems at ICLRE 1998.
    [23] P. D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL[C]. In Procs. of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany. 2001: 491-502.
    [24] Yu Zheng, Liang Ye, Geng-feng Wu, Xin Li. Extracting Product Features from Chinese Customer Reviews[C]. Intelligent System and Knowledge Engineering, 2008. ISKE 2008: 285-290.
    [25] Minqing HU, Bing LIU. Opinion Feature Extraction Using Class Sequential Rules[C].Proc of the Spring Symposium on Computational Approaches to Analyzing Weblogs. Stanford:, 2006: 11-21.
    [26] Lun-Wei Ku, Yu-Ting Liang, Hsin-Hsi Chen. Opinion Extraction, Summarization and Tracking in News and Blog Corpora [J]. In Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, AAAI Technical Report, 2006:100-107.
    [27] Soo-Min Kim, E Hovy. Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text [A] . 2006 Association for Computational Linguistics [C]. Sydney : 2006.
    [28] Bo Pang, Lillian Lee, Shivakumar Vaithyanathan: Thumbs up? Sentiment Classification using Machine Learning Techniques[C]. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania.2002.
    [29] Sista S. P. ,Srinivasan S. H. Polarized lexicon for review classification[C].In Proceedings of ICAI-04,the International Conference on Artificial Intelligence. Las Vegas, US, CSREA Press,2004:867-872.
    [30] Minqing Hu ,Bing Liu. Mining and Summarizing Customer Reviews[C]. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Seattle, Washington, USA, 2004,8.
    [31] Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews[C]. ACL2002:417-424.
    [32] Michael Gamon and Anthony Aue.2005.Automatic Identification of Sentiment Vocabulary: Exploiting Low Association with Known Sentiment Terms[C]. In Proceedings of the ACL2005 Workshop on Feature Engineering for Machine Learning in NLP. 2005.
    [33]娄德成,姚天昉 .汉语句子语义极性分析和观点抽取方法的研究[J].计算机应用.2006.11:2622-2625.
    [34] Vasileios Hatzivassiloglou, Janyce Wiebe. Effects of Adjective Orientation and Gradability on Sentence Subjectivity[C]. COLING 2000: 299-305.
    [35] Hatzivassiloglou, McKeown: Predicting the Semantic Orientation of Adjectives[C]. ACL 1997:174-181.
    [36] Chao Wang Jie Lu Guangquan Zhang. A Semantic Classification Approach for Online Product Reviews[C].Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI05).
    [37] Changhua Yang Kevin Hsin-Yih Lin Hsin-Hsi Chen. Building Emotion Lexicon from Weblog Corpora[C]. Proceedings of the ACL 2007 Demo and Poster Sessions, 2007:133–136.
    [38]Dave K,Lawrenee S,Pennoek D. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews[C]. WWW’2003:519-528.
    [39] Gamon M.,Aue A.,Corston-oliver S., Ringger E.. Pulse: Mining Customer Opinions from Free Text[C]. the 6th International Symposium on Intelligent Data Analysis. Lecture Notes in Computer Science, Springer-Verlag, Madrid, 2005:121-132.
    [40] Yi J,Niblack W. Sentiment Mining in WebFountain[C]. In Proc. ICDE-05, the 21st International Conference on Data Engineering, IEEE Computer Society, Tokyo, 2005:1073-1083.
    [41] Tsou B.,etl. Polarity Classification of Celebrity Coverage in the Chinese Press[C]. In Proc. Of the International Conference on Intelligence Analysis,Virginia,USA,2005.
    [42]章剑锋,张奇,吴立德,黄萱菁.中文观点挖掘中的主观性关系抽取[J].中文信息学报.2008,3:55-59.
    [43]倪茂树.基于语义理解的观点评论挖掘研究[D].大连理工大学.2007.
    [44]尚文倩.文本分类及其相关技术研究[D].北京交通大学.2007,6.
    [45]王素格.基于Web的评论文本情感分类问题研究[D].上海大学.2008,4.
    [46] Nitin Jindal, Bing Liu. Review Spam Detection[C]. In Proceedings of WWW-2007 Banff, Canada.2007,5:1189-1190.
    [47] Nitin Jindal, Bing Liu. Analyzing and Detecting Review Spam[C]. ICDM.2007:547-552.
    [48] Y.Yang,and Pedersen,J.Q.,A Comparative Study on Feature Selection in Text Categorization[C]. In Proceeding of the 14th International Conference on Machine Learning(ICML), 1997:412-420.
    [49] Fabrizio Sebastiani.Text Categorization. In Alessandro Zanasi(ed.),Text Mining and itsApplications[M], WIT Press,Southampton,UK,2005:109-129.
    [50]史忠植.知识发现[M].清华大学出版社,2002.
    [51]余俊英.文本分类中特征选择方法的研究[D].江西师范大学.2007,5.
    [52]代亮.基于支持向量机的文本分类问题研究[D].大连海事大学.2007,6.
    [53]苏新宁.信息检索理论与技术[M].科学技术文献出版社.2004.
    [54]苏金树,张博锋,徐听.基于机器学习的文本分类技术研究进展[J].软件学报. 2006,17(9): 1848-1859.
    [55]邸锦.基于支持向量机的文本分类问题的研究[D].北京交通大学.2008,6.
    [56] T. Mitehell. Machine Learning[M]. New York: McGraw-Hill,1997.
    [57]李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展.2005,42(l):94-101.
    [58] R.Adwait. Maximum Entropy Models for Natural Language Ambiguity Resolution[D]. Ph.D. Dissertation,University of Pennsylvania,1998.
    [59] R.Adwait. A Maximum Entropy Model for Part-of-speech Tagging[C]. The Empirical Methods in Natural Language processing Conference.Philadelphia,USA,1996.
    [60] L.B.Adam,A. StePhen,et al. A Maximum Entropy Approach to Natural Language processing[J]. Computational Linguistics,1996,22(l):35-73.
    [61] L.Bahl,F.Jelinek,R.Mereer’A Maximum Likelihood Approach to continuous Speech Recognition[J]. IEEE Transaction son Pattern Analysis and Machine Intelligence. 1983, 5(2):179-190.
    [62] C.Apte,P.Damerau,S.Weiss. Text Mining with Decision Rules and Decision Trees[C]. In Proceedings of the Conference on Automated Learning and Discovery Workshop 6: Learning from Text and the Web,1998.
    [63] J.R.Quinlan. Induction of Decision Tree[J]. Machine Learning,1986,1:81-106.
    [64] J.R.Quinlan.C4.5:Programs for Machine Learning[M]. San Mateo,CA:Morgan Kaufmannn,1993.
    [65] S.K.Murthy. Automatic Construction of Decision Tree from Data: A Multi- disciplinary Survey[J]. Data Mining and Knowledge Discovery, 1998,2:345-389.
    [66] E. J.Wiener, J.Perdersen, A.Wigend. A Neural Network Approach to Topic Spotting[C]. In Proceedings of the 4th symposium on Document Analysis and Information Retrieval,1995:317-332.
    [67] H.T.NG, W.B.GOH,K.L.LOW. Feature Selection, Perception Learning and a Usability Case Study for Text Categorization[C]. In SIGIR,Philadelphia,USA.1997,7:67-73.
    [68] M.E.Ruiz,P.S.Asan. Hierarchical Text Categorization Using Neural Networks[J]. InformationRetrieval,2002,5:87-118.
    [69] Vapnik VN. Estimation of Dependencies Based on Empirical Data[C]. Springer Verlag,1982.
    [70] Vapnik VN. The Nature of Statistical Learning Theory[C]. Springer Verlag,1995.
    [71] Miguel E,Ruiz , Padmini Srinivasan. Hierarchical Neural Networks for Text Categorization[C]. Proc. of SIGIR.99:22nd ACM International Conference on Research and Development in Information Retrieval.ACM press, New York,US,1999:281-282.
    [72] Yang Y, Chut C. An Example Based Mapping Method for Text Categorization and Retrieval[C]. ACM Transactions on Information Systems. 1994.23(3):252-277.
    [73] Yiming Yang,Xin Liu. A Re-Examination of Text Categorization Methods[C]. 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR,99). New York. ACM Press. 1999:42-49.
    [74]王永恒,贾焰,杨树强.大规模文本数据库中的短文分类方法[J].计算机工程与应用.2006.7:5-7.
    [75] J Hynek, K Jezek, O Rohlik. Short Document Categorization-Itemsets Method[C]. In : PKDD 4th European Conference on Principles and Practice of Knowledge Discovery in Databases , Workshop Machine Learning and Textual Information Access, Lyon, France, 2000:14-19.
    [76] Hui He, Bo Chen, Weiran Xu, Jun Guo. Short Text Feature Extraction and Clustering for Web Topic Mining[C]. Third International Conference on Semantics, Knowledge and Grid.2007:382-385.
    [77]王细薇,樊兴华,赵军.一种基于特征扩展的中文短文本分类方法[J].计算机应用.2009.3. 841-845.
    [78] D Song, P D Bruza, Z Huang et al.Classifying Document Titles Based on Information Inference[C].In: proceedings of the 14th International Symposium on Methodologies for Intelligent Systems , Japan , 2003:297-306.
    [79]常娟.针对短文本数据的自动分类方法比较研究[J].消费导报.2008.2.177-178.
    [80]张学工.统计学习理论的本质[M].北京:清华大学出版社, 2000.
    [81] Guyon I, Boser B, Vapnik V. Automatic capacity tuning of very large VC-dimension classifiers[J]. Advances in Neural Information Processing Systems. San Mateo, 1993, 5:147-155.
    [82] Scholkopf B, Burges C J C, Smola A J. Advances in Kernel Methods Support Vector Learning [M].Cambridge [M]: MIT Press, 1999:147-168.
    [83] PLATT J C. Fast Training of Support Vector Machines Using Sequential Minimal Optimization [M]. Advances in Kernel Method-Support Vector Learning. Cambridge, MA: MIT Press, 1999:185-208.
    [84] Chin, K. K. Support Vector Machines Applied to Speech Pattern Classification[M]. Cambridge University Press, 1999.
    [85] Joachims T. Text Categorization with Support Vector Machines: Learning with many Relevant Features[C]. In: Proceedings 10th European speech conference on Machine Leaming. Chemnitz. Springer-Verlag. 1998.137-142.
    [86]苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报. 2006,9:1848-1859.
    [87]熊浩勇.基于SVM的中文文本分类算法研究与实现[M].武汉理工大学.2008,4.
    [88]秦玉平,王秀坤,艾青,刘卫江.基于模糊支持向量机的多主题文本分类算法研究[J].小型微型计算机系统.2008,3:548-551.
    [89]龚才春.短文本语言计算的关键技术研究[D].中国科学院.2008.6.
    [90]樊兴华,王鹏.基于两步策略的中文短文本分类研究[J].大连海事大学学报. 2008,8:121-124.
    [91]胡佳妮,郭军,邓伟洪,徐蔚然.基于短文本的独立语义特征抽取算法[J].通信学报.2007,12:121-124.
    [92] David Bounie, etc.The Effect of Online Customer Reviews on Purchasing Decisions: the Case of Video Games[Z]. http://e.darmon.free.fr /workcommed/papers/ bounie_bourreau_ gensollen_ waldbroeck_2_ nice.pdf.2005.
    [93] Eugene Agichtein Luis Gravano.Snowball: Extracting Relations from Large Plain-Text Collections[J]. ACM International Conference on Digital Libraries[C].New York:ACM Press,2000:85-94.
    [94]李凡,鲁明羽,陆玉昌.关于文本特征抽取新方法的研究[J].清华大学学报,2003,4l(7):98-l01.
    [95]熊忠阳,张鹏招,张玉芳.基于χ2统计的文本分类特征选择方法的研究[J].计算机应用.2008,2:513-518.
    [96] Soo-Min Kim, Eduard H. Hovy. Determining the Sentiment of Opinions[C]. Coling 2004.
    [97] Steven Abney. Bootstrapping[C]. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02) 2002.
    [98] Riloff E, Jones R. Learning Dictionaries for Information Extraction by Multi-level Bootstrapping[C]. Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99). 1999: 811-816.
    [99] Brin S. Extracting Patterns and Relations from the World Wide Web[C]. Proceedings of the 1998 International Workshop on the Web and Databases. 1998.
    [100]陈晓颖,胡熠,陆汝占.实体关系模板的获取技术[J].计算机工程.2007,11:199-201.
    [101]姜吉发,王树西.一种自举的二元关系和二元关系模式获取方法[J].中文信息学报.2005,2:71-77.
    [102]陈文亮,朱慕华,朱靖波,姚天顺.基于Bootstrapping的文本分类模型[J].中文信息学报.2005,2:86-92.
    [103] Li Weigang ,Liu Ting, Li Sheng.Bootstrapping for Extracting Relations from Large Corpora[J]. Journal of Electronics. 2008,1:89-96.
    [104] Zhang Z. Weakly-Supervised Relation Classification for Information Extraction[C]. In: Proc.of ACM the 13th Conf. on Information and Knowledge Management (CIKM2004). Washington :ACM Press,2004:581-588.
    [105]陈锦秀,姬东鸿.基于图的半监督关系抽取[J].软件学报.2008,11:2843-2852.
    [106]邓超,郭茂祖.基于Tri-Training和数据剪辑的半监督聚类算法[J].软件学报. 2008,3:663-673.
    [107]周志华.半监督学习中的协同训练风范[M].机器学习及其应用.清华大学出版社,北京.2007:259-275.
    [108]徐从富,李石坚,王金龙.机器学习研究与应用新进展[Z].http://www.cs.zju.edu.cn/people/ xucf/course/other/机器学习研究与应用新进展(修改稿,2006-10-16).pdf.
    [109]潘胜玲,刘学军.基于标准差的地形三维表面模型建立方法[J].计算机应用研究. 2007,11:295-297.
    [110]张亮,陈家骏.基于大规模语料库的句法模式匹配研究[J].中文信息学报.2007,9:31-35.
    [111]丁丰,袁保宗.一种基于最大熵原理的汉语实体提取方法[J].铁道学报,2001,3(5):34-37.
    [112]周雅倩,郭以昆,黄萱菁,吴立德.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展.2003,40(3):440-446.
    [113]李蕾,周延泉,王著华.基于全信息的中文信息抽取系统及应用[J].北京邮电大学学报.2005,25(6):48-51.
    [114]俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报.2006,27(2):87-93.
    [115]李中国,刘颖.边界模板和局部统计相结合的中国人名识别[J].中文信息学报.2006,20(5):44-50.
    [116]蔡晓白,樊孝忠.疾病命名短语识别的最大熵方法[j].北京理工大学学报.2006,26(6):517-520.
    [117] ACE 2007. The nist ace evaluation website[Z]. http://www.nist.gov/speech/ tests/ace/ace07/.
    [118] Kambhatla N. Combining Lexical, Syntactic and Semantic Features with Maximum Entropy Models for Extracting Relations[A]. In: Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics[C]. Barcelona, Spain. 21-26, July 2004.
    [119]车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报.2005,19 (2):1-6.
    [120]邓擘,樊孝忠,杨立公.用语义模式提取实体关系的方法[J].计算机工程,2007,33(10): 212-214.
    [121]董静,孙乐,冯元勇,黄瑞红.中文实体关系抽取中的特征选择研究[J].中文信息学报.2007,7:81-85.
    [122] Banko M, Cafarella M J, Soderland S, et al. Open Information Extraction from the Web[C]. In: Proceeding of the International Joint Conferences on Artificial Intelligence, 2007.
    [123]王治和.表格信息抽取引擎的设计与实现.计算机科学[J].2006,10:216-175.
    [124]贾丰,张燕.网络信息挖掘系统评价初探[J].情报理论与实践.2003,3: 267-269.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700