面向特定领域的互联网舆情分析技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

面向特定领域的互联网舆情分析技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Domain-Oriented Public Sentiment Analysis Technology
作者：张长利
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：主题爬行 ; PU文本分类 ; 评价挖掘 ; 情感分类 ; 字符串核函数
英文关键词：Focus crawling ; PU text classification ; opinion mining ; sentiment classification ; string kernel
学位年度：2011
导师：左万利
学科代码：081203
学位授予单位：吉林大学
论文提交日期：2011-06-01
答辩委员会主席：杨宏戟

摘要

随着互联网技术的飞速发展,网络上的信息呈指数级增长,同时web2.0的交互性技术使人们能够在互联网上进行交流和发表各种意见评论信息,因此互联网上存在各种各样的舆情信息,在信息海洋中,信息处于淹没状态,给人们查找所需信息带来极大困难,如何获取网络上关于特定领域主题事件的舆情信息?聚焦爬虫技术与情感分析技术相结合使我们对特定领域的舆情分析成为可能。通过分析特定领域的网络舆情可以为相关决策部门提供辅助决策支持,有助于企业改进方案计划,为用户提供有用的帮助与导向信息。本文针对其中的一些关键技术和理论方法作了如下三个方面的工作：
     (1)提出了基于综合价值具有增量特性的主题爬虫。在主题相关信息采集方面,以往的爬虫在满足爬全率(recall)的同时牲牺了爬准率(harvest)以及爬行效率,而为了提高爬准率往往又降低了爬全率。通过采用前后端分类器,前端基于链接语境图训练链接预测分类器,使爬虫具有一定的穿越隧道的能力,后端使用主题内容分类器识别主题相关网页,同时使用网页内容可视化分块技术,并基于链接的综合价值进行网页预测,提高了爬全率、爬准率及爬行效率。
     (2)提出了基于无监督聚类的PU文本分类方法。传统的机器学习文本分类模型需要大量的标注语料做为训练集,PU文本分类算法是解决某些机器学习中训练样本获取代价过大,尤其是反例样本较难获取的实际问题,而传统的分类算法大都需要正例和反例数据集才能取得良好的效果,因此要使用传统的分类方法来解决面向PU的分类问题,U集中可信反例的提取是分类器能够取得良好效果的关键,本文提出了有效的可信反例提取算法(基于聚类的可信反例提取算法)-CBRN,并对已有的PU文本分类算法进行了改进并提出了SPY-SVM算法,提高了可信反例提取的数量和准确率,也提高了PU文本分类的准确率。
     (3)评价挖掘是针对特定领域主题的主观性文本自动提取有用的情感信息和知识,可为政府部门、企业及用户提供有价值的意见信息。本文针对中文文本进行褒、贬情感倾向性分析,提出了三种情感倾向性分析算法,1)基于规则及情感词提取评价四元组的评价挖掘算法和基于unigram+评价短语特征的机器学习评价挖掘算法,2)基于字符串核函数的评价挖掘算法,3)基于规则及聚合模型的句子级到篇章级的中文评价挖掘算法。
With the rapid development of Internet technologies, information in web mounts up exponentially. In the meanwhile, interaction technologies of web2.0 enable people to communicate on the Internet and post variety opinions and comments. There has been a variety of public sentiment information on the Internet. Therefore people are facing great difficulties in searching for desired information because the information is always hidden in information ocean. How to get public sentiment information about domain-specific events? The combination of focused crawler technology and sentiment analysis technology make it possible to resolve this problem. By analyzing public sentiment information in specific domain can support decision making of policy-making departments, help enterprises improve program plans, and provide users with useful information. To meet these demands, this dissertation proposes a lot of key techniques, theories and methods as shown in the following three sections:
     1. The dissertation proposes Focus Crawler with incremental capability based on synthetic estimate value. Subjects on the web are distributed interweavedly, but the same subject on web has certain distribution rules. We summarize these rules as Hub, Sibling/Linkage Locality, Site subject, Tunnel, Topic trap. We design Focus Crawlers based on the proposed rules. Recent years have witnessed a lot of research on focus crawlers. However, these studies have some limitations. They improved recall at the cost of sacrificing harvest and efficiency. On the other hand, recall would decrease if harvest were satisfied. In this dissertation, we propose front-end/back-end classifiers as the part of link's topic-relevance forecasting. The front-end classifier trains classification model based on linkage context graph, uses the webpage visualized content block partition technique, and predicts whether the link of webpage is topic-relevance based on link's synthetic values. It endows focus crawlers with the ability of going through tunnel, i.e., enables focused crawler to start from some topic-relevant webpage, pass through some irrelevant webpage and reach other topic-relevant webpage. The back-end classifier is used to recognize topic-relevant WebPages based on text content of WebPages. The experimental results show that our focused crawler can dramatically improve recall rate, harvest rate and efficiency.
     2. The PU-Oriented Text Classifier Based on Unsupervised Clustered Learning Algorithm is proposed. Traditional text classification models are based on machine learning and need a large amount of labeled corpus as train datasets. So a large number of labeled training documents/webpages (often negative training data) are needed to build accurate classifiers. In text classification, the labeling is typically done manually by reading the documents/webpages, which is a labor-intensive and time-consuming process. Collecting negative training examples is especially painstaking and tedious because (1) negative training examples must uniformly represent the universal set except the positive class (e.g., sample of a nonhomepage should represent the Internet uniformly excluding the homepages), and (2) manually collecting negative training examples tends to cause unconscious bias because of human's unintentional prejudice, which could deteriorate classification performance such as accuracy, precision, recall, etc. PU-Oriented text classifier aims to solve the problems in machine learning that no labeled negative documents are available in the training example set or negative examples are very difficult to collect. Traditional classification algorithm cannot obtain good performance without sufficient positive and negative training dataset. When using traditional classifier to conduct PU-oriented text classification, the key is the extraction of reliable negative training example from unlabeled documents/webpages. The PU-oriented text classification based on machine learning often adopts a two-step approach by making use of both positive and unlabeled examples. At the first step, a lot of reliable negative documents are identified. At the second step, the classifiers are constructed iterative based on training datasets. In this dissertation, the clustering based reliable negative example extraction (CBRN) algorithm is proposed. The number and the accuracy of reliable negative examples extraction is improved. Existing classification is improved, which builds a set of classifiers by iterative applying the SPY-SVM algorithm. This approach randomly selects s% of the documents from the positive set P as the spies and add them into unlabeled datasets. These spies can help improve the accuracy of identifying the negative from unlabeled datasets, and train the classifier iterative until termination condition meets. Experimental results show that our method outperforms other algorithms in terms of accuracy, recall, precision and Fl-measure.
     3. Opinion mining or sentiment analyzer is to extract sentiment (or opinion) about a subject from online subjective text documents. At first it classifies the sentiment of an entire document about a subject. It can provide valuable information for government, enterprise and users. The dissertation proposes three semantic orientation analysis (positive, negative and neutral semantic orientation analysis) algorithms for Chinese text. These three methods are described as below:
     1) Polarity Classification of Public Health Opinions in Chinese text. With frequently bursting of public health events over the world, people are increasingly expressing their views on these events online. Government agencies need to response and make policies according to these views. We study Chinese opinion mining under the context of public health. This dissertation proposes two complementary approaches-a sentiment word based approach and a machine learning approach. The Chinese sentiment word based approach extracts an opinion quadruple from each single sentence based on rules. We notice that different types of sentences have different contributions to the overall polarity and take into account three types of sentences:common sentences, first-person sentences, topical sentences. We give different weights to these three types of sentences when synthesizing the overall polarity scores of entire review through weighted average. The machine learning based approach extracts unigrams and opinion phrase features by labeling train datasets, selects features by information gain method and train sentiment classification model using ten-fold cross validation. The experiment results show that both methods achieve good performance.
     2) This dissertation proposes a string kernel based approach for sentiment classification on Chinese reviews. Machine learning based sentiment classification approaches depend on a feature vector which represents a text. They usually utilize words or n-grams as features and construct feature vectors according to their presence/absence or frequencies. They use these feature vectors to construct sentiment classification model. The selection of feature set is considered as the most important point in classifying documents. The features extract module not only needs comprehensive experts'knowledge, but also ignores the information on word positions, i.e., may lost important information when extracting features such as the position of words and mutual information between words. The word order is extremely important to sentiment analysis. This dissertation proposes sentiment classification for Chinese reviews using machine learning methods based on string kernel. The features are all possible ordered subsequences of characters. It can construct sentiment classification model if important information are not lost. We conduct experiments to show the power of our approach as well.
     3) Sentiment analysis of Chinese documents from sentence to document level:This dissertation proposes a rule-based approach including two phases:first, determining each sentence's sentiment based on word dependency and context modifier component, second, aggregating sentences polarity scores to predict the document sentiment. We assign sentences with different weights to adjust their contribution to the overall polarity based on five features, including position of the sentence, weight term/tf-isf weighted of the sentence, the similarity between the sentence and the headline, the occurrence of keywords in the sentence, and the first-person mode. We report the experimental results of comparing our approaches with three machine learning-based approaches based on two datasets of Chinese articles. Our approach achieves similar performance in comparison to SVM. Moreover, our rule-based approach is much more portable and adaptable to various topic domains since it does not require the manual annotation of large amounts of training data. These results illustrate the effectiveness of our proposed method and its advantages against learning-based approaches.

引文

[1]Picard RW. Affective computing [M]. MIT Press, Cambridge, MA,1997.
    [2]Sentiment analysis [EB/OL]. http://en.wikiPedia.org/wiki/Sentiment_analysis.
    [3]Kim, S.-M.,& Hovy, E. (2004). Determining the sentiment of opinions [C/OL]. In Proceedings of the International Conference on Computational Linguistics (COLING 2004) (p.1367～1373). East Stroudsburg, PA:Association for Computational Linguistics.
    [4]Salvetti F, Lewis S, Reiehenbach C. Automatic opinion Polarity classification of movie reviews [J/OL]. Colorado Research in Linguistics,17(1),2004,1-15.
    [5]Bo Pang, Lillian Lee and Shivakumar Vaithyanathan:Thumbs up? Sentiment Classification using Machine Learning Techniques [C/OL]. In Proc. Conf. on EMNLP (2002).
    [6]Kennedy A, InkPen D. Sentiment classification of movie reviews using contextual valence shifters [J/OL]. Computational Intelligence,22(2),2006,110-125.
    [7]徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类[J/OL].中文信息学报,21(6),2007,95-100.
    [8]唐慧丰,谭松波,程学旗.基于监督学习的中文情感分类技术比较研究[J/OL].中文信息学报,21(6),2007,88-94.
    [9]刘康,赵军.基于层叠CRFs模型的句子褒贬度分析研究[J/OL].中文信息学报,22(1),2008,123-128.
    [10]Pang B, Lee L. A sentimental education:sentiment analysis using subjectivity summarization based on minimum cuts [C/OL]. In Proc. of the 42nd Meeting of the Association for Computational Languages,2004,271-278.
    [11]Pang B, Lee L. Seeing stars:exploiting class relationships for sentiment categorization with respect to rating scales [C/OL]. In Proc. Of the 43rd Annual Meeting on Association for Computational Linguistics,2005,115-124.
    [12]Goldberg AB, Zhu X. Seeing stars when there aren't many stars:Graph-based semi-Supervised learning for sentiment categorization [C/OL]. In Proc.of HLT-NAACL 2006Workshop on Textgraphs:Graph-based Algorithms for Natural Language Processing,2006,45-52.
    [13]Reka Albert et al.1999][R. Albert, H. Jeong and A.-L. Barabasi, Diameter of the World Wide Web [J/OL], Nature 401 (1999) 130-131.
    [14]S Brin, L Page. The Anatomy of a Large-Scale Hypertextual WebSearch Engine[J] Computer Networks and ISDN Systems,1998,30(1):107-117.
    [15]A Heydon, M Najork. Mercator:A Scalable, Extensible Web Crawler [J]. World Wide Web, 1999,2 (4):219-229.
    [16]S Chakrabarti, M van den Berg, B Dom. Focused Crawling:A New Approach to Topic-specific Web Resource Discovery[J]. Computer Networks,1999,31(11):1623-1640.
    [17]Junghoo Cho, Hector Garcia-Molina, and Lawrence Page. Efficient crawling through url ordering [C/OL]. In Proceedings of the Seventh International World-Wide Web Conference,1998. Available at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1999-0103.
    [18]Andrei Z. Broder, Ravi Kumar. Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet L. Wiener:Graph structure in the Web [J/OL]. Computer Networks 33(1-6):309-320 (2000).
    [19]Michael K Bergman. The Deep Web:Surfacing Hidden Value [EB/OL]. http://www.brightplanet.com/resources/details/deepweb.html,2001209224.
    [20]Kevin Chen-chuan Chang and Bin He and Chengkai Li and Mitesh Patel and Zhen Zhang. Structured databases on the Web:Observations and Implications [J/OL]. SIGMOD Record,2004,33(3):61-70.
    [21]F Menczer, G Pant, M Ruiz, P Srinivasan. Evaluating Topic-Driven Web Crawlers [C]. Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C],2001,241-249.
    [22]P. De Bra, G-J Houben, Y. Kornatzky, and R. Post, "Information Retrieval in Distributed Hypertexts" [C]. In the Proceedings of RIAO'94, Intelligent Multimedia, Information Retrieval Systems and Management, New York, NY,1994.
    [23]M.Hersovici, M.Jacovi, YMaarek, D.Pelleg. M.Shtalhaim and S.Ur. The shark-search algorithm-an application:tailored web site mapping [C/OL]. In:Proceedings of the 7th World Wide Web Conference.1998. URL http://www.cs.cmu.edu/-dpelleg/bin/360.html.
    [24]S.Chakrabarti, K.Punera and M.Subramanyam. Accelerated focused crawling through online relevance feedback [C/OL]. In:WWW.2002,pp.148-159.
    [25]J.Rennie and A.McCallum. Using reinforcement learning to spider the web efficiently [C/OL]. In Proceedings of the 16th International Conference on Machine Learning ICML-99,1999.
    [26]Chakrabarti S, van den Berg M, Dom B. Focused crawling:a new approach to topic-specific Web resource discovery[J]. Computer Networks,1999,31 (11～16): 1623-1640.
    [27]赫枫龄,左万利.利用超链接信息改进网页爬行器的搜索策略[J].吉林大学学报(信息科学版).VOL 23,No 1.Jan 2005.
    [28]MUKHERJEAS.WTMS:A System for Collecting and Analyzing Topic-Specific Web Information [C/OL]. InProceedings of the 9th International World Wide Web Conference[C]. Amsterdam:Netherlands ACM Press,2000:15-19.
    [29]Diligenti M, Coetzee F, Lawrence S, Giles C, Gori M. Focused crawling using context graphs [C/OL]. In:Proceedings of the 26th International Conference on Very Large Databases (VLDB), Cairo, Egypt 2000.
    [30]Ruihua Song, Haifeng Liu, Ji-Rong wen, Wei-Ying Ma learning block importance models for web pages [C/OL]. www 2004, May 17-22,2004,New York, NY USA.
    [31]M.Diligenti, F.M.Coetzee, S.Lawrence, C.L.Giles and M.Gori.Focused crawling using context graphs [C/OL]. In:Proceedings of the 26th VLDB Conference. Morgan Kaufmann Publisher,2000, pp.527-534.
    [32]S.Brin and L.Page.The anatomy of a large-scale hypertextual Web search engine [J/OL]. Computer Networks,1998.30(1-7), pp.107-117.
    [33]C.C.Aggarwal,F.Al-Garawi and P.S.Yu. Intelligent crawling on the world wide web with arbitrary predicates [C/OL]. In:WWW.ACM Press,2001, pp.96-105.
    [34]F.Menczer and R.K.Belew. Adaptive retrieval agents:internalizing local context and scaling up to the Web [M/OL]. Machine Learning,2000.39(2-3), pp.203-242.
    [35]F. Denis, PAC learning from positive statistical queries [C/OL]. Workshop on Algorithmic Learning Theory (ALT),1998.
    [36]F. Denis, R. Gilleron, and M. Tommasi, Text classification from positive and unlabeled examples [C/OL], Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU),2002.
    [37]Bing Liu, Wee Sun Lee, Philip S. Yu, Xiaoli Li, Partially supervised classification of text documents [C/OL], The Nineteenth International Conference on Machine Learning(ICML),2002, pp.384-397.
    [38]Yu, H., Han, J.,& Chang, K. (2002). PEBL:Positive example based learning for Web page classification using SVM [C/OL]. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (pp.239-248). NewYork:ACM Press.
    [39]Larry M. Manevitz, MalikYousef, One-Class SVMs for document classification [J/OL]. Journal of Machine Learning Research, volume 2 (2001) 139-154.
    [40]Xiaoli Li, Bing Liu, Learning to classify text using positive and unlabeled data [C]. The International Joint Conference on Artificial Intelligence (IJCAI)(2003).
    [41]Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee, Philip S. Yu, Building Text Classifiers Using Positive and Unlabeled Examples [C/OL]. Proceedings of the Third IEEE International Conference on Data Mining (ICDM) (2003) 179-187.
    [42]T. Joachims. Text categorization with support vechine machine [C/OL]. Proc.10th European Conf. Machine Learning(ECML'98),pp.137-142,1998.
    [43]B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classiers [J/OL]. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144-152, Pittsburgh, PA, July 1992. ACM Press.
    [44]C. J. C. Burges, Simplified support vector decision rules [C/OL]. In L. Saitta, editor, Proc. 13th International Conference on Machine Learning, pages 71-77, San Mateo, CA,1996. Morgan Kaufmann.
    [45]E. Osuna, R. Freund and F. Girosi. An Improved Training Algorithm for Support Vector Machines [C/OL]. To appear in Proc. of IEEE NNSP'97, Amelia Island, FL,24-26 Sep.,1997.
    [46]T. Joachims,11 in:Making large-Scale SVM Learning Practical [J/OL]. Advances in Kernel Methods-Support Vector Learning, B. Scholkopf and C. Burges and A. Smola (ed.), MIT Press,1999.
    [47]J. Platt. Sequential minimal optimization:A fast algorithm for training support vector machines [R/OL]. Technical Report MSR-TR-98-14, Microsoft Research,1998.
    [48]Craig Macdonald, Iadh Ounis, Ian Soboroff. Overview of the TREC2007 Blog Track [C/OL]. In Proceedings of the Sixteenth Text REtrieval Conference (TREC 2007).2007.
    [49]Iadh Ounis, Maarten de Rijke, Craig Macdonald, Gilad Mishne, Ian Soboroff. Overview of the TREC-2006 Blog Track [C/OL]. In Proceedings of the Fifteenth Text REtrieval Conference (TREC 2006).2006.
    [50]Yohei Seki, David Kirk Evans, Lun-Wei Ku, Hsin-His Chen, Noriko Kando and Chin-Yew Lin [C/OL]. Overview of Opinion Analysis Pilot Task at NTCIR-6. In Proceedings of the 6th NTCIR Workshop.2007.
    [51]许小颖,陶建华.汉语情感系统中情感划分的研究[C]//第一届中国情感计算及智能交互学术会议.北京,2003：199-205.
    [52]S.-M. Kim and E. Hovy. Determining the Sentiment of Opinions [A]. In:Proceedings of COLING-04, the Conference on Computational Linguistics (COLING-2004) [C]. Geneva, Switzerland:2004.1367-1373.
    [53]General Inquirer [EB/OL]. http://www.wjh.harvard.edu/-Inquirer/
    [54]Fellbaum C. WordNet:An Electronic Lexical Database [M]. Bradford Book.1998.
    [55]Esuli A. and Sebastiani F., Sentiwordnet:A publicly available lexical resource for opinion mining [C]. In Proceedings of LREC-06, the 5th Conference on Language Resources and Evaluation. Genova:IT.2006.
    [56]HowNet [M]. HowNet's Home Page, http://www.keenage.com
    [57]Vasileios Hatzivassiloglou, Kathleen R. McKeown,Predicting the semantic orientation of adjectives [C], Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, p.174-181, July 07-12,1997, Madrid, Spain.
    [58]Theresa Wilson, Janyce Wiebe, Paul Hoffmann, Recognizing contextual polarity in phrase-level sentiment analysis [C], Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.347-354, October 06-08,2005, Vancouver, British Columbia, Canada.
    [59]朱嫣岚,闵锦,周雅俏,黄萱菁,吴立德.基于Hownet的词汇语义倾向计算,中文信息学报[J],2006年第一期.
    [60]Turney, P.:Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews [C]. In:Proceedings of ACL 2002,40th Annual Meeting of the Association for Computational Linguistics, pp.417-424. Association for Computational Linguistics, Philadelphia, US (2002).
    [61]Turney, Peter and Littman Michael. (2003). Measuring Praise and Criticism:Inference of Semantic Orientation from Association [J]. ACM Transactionsons on Information Systems,21(4),315-346.
    [62]Hatzivassiloglou, V.,& McKeown, K.R. (1997). Predicting the semantic orientation of adjectives [C]. In Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics (pp.174-181). East Stroudsbourg, PA: Association for Computational Linguistics.
    [63]Yuen, R.W.M., Chan, T.Y.W., Lai, T.B.Y., Kwong, O.Y.,& T'sou, B.K.Y. (2004). Morpheme-based derivation of bipolar semantic orientation of Chinese words [C]. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04) (p.1008-1014). East Stroudsburg, PA:Association for Computational Linguistics.
    [64]Yu, H.,& Hatzivassiloglou, V. (2003).Towards answering opinion questions:separating facts from opinions and identifying the polarity of opinion sentences [C]. Paper presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP).
    [65]Kim, Soo-Min and Eduard Hovy.2005. Identifying Opinion Holders for Question Answering in Opinion Texts [C]. Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains.
    [66]Meena, A. and T. V. Prabhakar (2007). "Sentence Level Sentiment Analysis in the Presence of Conjuncts Using Linguistic Analysis [J]." ECIR 4425:573-580.
    [67]Turney, P.D.2001. Mining the Web for synonyms:PMI-IR versus LSA on TOEFL [C]. Proceedings of the Twelfth European Conference on Machine Learning (pp.491-502). Berlin:Springer-Verlag.
    [68]PANG, B. and LEE, L.2004. A sentimental education:Sentiment analysis using subjectivity summarization based on minimum cuts [C]. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), pp. 271-278, Barcelona, Spain.
    [69]Ryan McDonald, Kerry Hannan and Tyler Neylon et al. Structured Models for Fine-to-Coarse Sentiment Analysis [C].I n:Proceedings of ACL[C].2007.432-439.
    [70]Abbasi, A., Chen, H.,& Salem, A. et al. (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums [J]. ACM Trans. Information Systems,26(3),1-34.
    [71]K. Denecke. Using sentiwordnet for multilingual sentiment analysis [C]. In proceedings of the IEEE International Conference on Data Engineering (ICDE2008), pages:507-512, 2008.
    [72]Su, Q., Xu, X., Guo, H., Guo, Z., Wu, X., Zhang, X., Swen, B.,& Su. Z. (2008). Hidden sentiment association in Chinese Web opinion mining [C]. In Proceeding of the 17th International Conference on World Wide Web (WWW'08) (pp.959-968). NewYork: ACM Press.
    [73]Dave, K., Lawrence, S., Pennock, D.M.:Mining the peanut gallery:Opinion extraction and semantic classification of product reviews [C]. In:Proceedings of WWW 2003,12th International Conference on the WorldWide Web, Budapest, HU, pp.519-528. ACM Press, New York (2003).
    [74]Kobayashi, N., Iida, R., Inui, K., Matsumoto, Y.:Opinion mining on the web by extracting subject-aspect-evaluation relasions [C]. In:Proceedings of AAAI-CAAW 2006, the Spring Symposia on Computational Approaches to Analyzing Weblogs, Stanford, US (2006).
    [75]Wang, B., Wang, H.:Bootstrapping both product properties and opinion words from chinese reviews with cross-training [J]. In:Web Intelligence, pp.259-262 (2007).
    [76]Y. Qiang. S. Wen, and L. Yijun, "Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach," [C]. Proceedings of the 39th Annual Hawaii International Conference on System Sciences-Volume 03,2006.
    [77]Abbasi et al. (2008) Abbasi, Ahmed, Chen, Hsin-Hsi, and Salem, Arab.2008. Sentiment Analysis in Multiple Language:Feature Selection for Opinion Classification in Web Forums [J]. ACM Transactions on Information Systems 26.
    [78]D. Mladenic, M. Grobelnik. Word sequences as features in text-learning [C]. In Proceedings of ERK-98, Seventh Electrotechnical and Computer Science Conference, pp.145-148, Ljubljana,1998.
    [79]Lei Zhang, Debbie Zhang, Simeon J. Simoff, John Debenham. Weighted kernel model for text categorization [C] Proceedings of the Fifth Australasian Conference on Data Mining and analystics. Darlinghurst:Australian Computer Society,2006:111-114.
    [80]Human Lodhi, Craig Saunders, Nello Cristianimi. Text Classification using String Kernels[J]. Journal of Machine Learning Research 2,2002; (2):419-444.
    [81]Mishne, G,& Glance, N. (2006, March). Predicting movie sales from blogger sentiment [C]. Paper presented at the AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW2006), Stanford, CA.
    [82]Laver, M., Benoit, K.,& Garry, J. (2003). Extracting policy positions from political texts using words as data [J]. American Political Science Review,97(2),311-331.
    [83]Efron, M. (2004). Cultural orientation:classifying subjective documents by cociation [sic] analysis [J]. In Proceedings of the AAAI Fall Symposium on Style and Meaning in Language, Art, Music, and Design (pp.41-48). Menlo Park, CA:AAAI Press.
    [84]Mullen, T.,& Malouf, R. (2006, March). A preliminary investigation into sentiment analysis of informal political discourse [J]. In Proceedings of the AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW 2006) (pp.159-162), Menlo Park, CA:AAAI Press.
    [85]Thomas, M., Pang, B.,& Lee, L. (2006). Get out the vote:Determining support or opposition from Congressional floor-debate transcripts [C]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 327-335). East Stroudsburg, PA:Association for Computational Linguistics.
    [86]Bansal, M., Cardie, C.,& Lee, L. (2008). The power of negative thinking:Exploiting label disagreement in the min-cut classification framework [C]. In Proceedings of the International Conference on Computational Linguistics (COLING 2008) Companion volume:Posters (pp.15-18). East Stroudsburg, PA:Association for Computational Linguistics.
    [87]Nasukawa, T.,& Yi, J. (2003). Sentiment analysis:capturing favorability using natural language processing [C]. In Proceedings of the 2nd International Conference on Knowledge Capture (K-Cap'03) (pp.70-77). NewYork:ACM Press.
    [88]Liu, B., Hu, M.,& Cheng, J. (2005). Opinion observer:analyzing and comparing opinions on the Web [C]. In Proceedings of the 14th International Conference on World Wide Web (WWW'05) (pp.342-351). NewYork:ACM Press.
    [89]Popescu, A.-M.,& Etzioni, O. (2005). Extracting product features and opinions from reviews [C]. In Proceedings of the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP'05) (pp.339-346), East Stroudsburg, PA:Association for Computational Linguistics.
    [90]Riloff, E., Patwardhan, S.,& Wiebe, J. (2006). Feature subsumption for opinion analysis [C]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp.440-448). East Stroudsburg, PA:Association for Computational Linguistics.
    [91]Tsou, B.K.Y,Yuen, R.W.M., Kwong, O.Y., Lai,T.B.Y,& Wong, W.L. (2005, May). Polarity classification of celebrity coverage in the Chinese press [C]. Paper presented at the International Conference on Intelligence Analysis, Vienna, VA.
    [92]Meena, A.,& Prabhakar, T.V. (2007). Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis [C]. In Proceeedings of the European Conference on IR Research (ECIR). Lecture Notes in Computer Science,4425,573-580.
    [93]Mao.Y.,& Lebanon, G. (2006). Isotonic conditional random fields and local sentiment flow [C]. In Proceedings of the 2000 Neural Information Processing Systems Conference (NIPS) (pp.961-968). Cambridge, MA:MIT Press.
    [94]Pang, B.,& Lee, L. (2004). A sentimental education:sentiment analysis using subjectivity summarization based on minimum cuts [C]. In Proceedings of the Association for Computational Linguistics (ACL) (pp.271-278). East Stroudsburg, PA: Association for Computational Linguistics.
    [95]McDonald, R., Hannan, K., Neylon, T., Wells, M.,& Reynar, J. (2007). Structured models for fine-to-coarse sentiment analysis [C]. In Proceedings of the Association for Computational Linguistics (ACL) (pp.432-439). East Stroudsburg, PA:Association for Computational Linguistics.
    [96]Wiebe, J., Wilson, T., Bruce, R., Bell, M.,& Martin, M. (2004). Learning subjective language [C]. Computer Linguistics,30(3),277-308.
    [97]Kamps, J., Marx, M.,Mokken, R.,&de Rijke, M. (2004). UsingWordNet to measure semantic orientation of adjectives [C]. In Proceedings of 4th International Conference on Language Resources and Evaluation IV (LREC-04) (pp.1115-1118). Paris:Evaluations and Language Resources Distribution Agency.
    [98]Hu, M.,& Liu, B. (2006, March). Opinion feature extraction using class sequential rules [C]. Paper presented at AAAI-CAAW-06, the Spring Symposia on ComputationalApproaches toAnalyzingWeblogs, Stanford, CA.
    [99]Wiebe, J.,& Riloff, E. (2005). Creating subjective and objective sentence classifiers from unannotated texts [C]. In Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2005). Lecture Notes in Computer Science,3406,486-497.
    [100]Meena, A.,& Prabhakar, T.V. (2007). Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis [C]. In Proceeedings of the European Conference on IR Research (ECIR). Lecture Notes in Computer Science,4425,573-580.
    [101]Wiebe. J., Bruce. R.,& O'Hara, T.P. (1999). Development and use of a goldstandard data set for subjectivity classifications [C]. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 246-253). East Stroudsburg, PA:Association for Computational Linguistics.
    [102]Dong, Z.,& Dong, Q. (2003). HowNet—a hybrid language and knowledge resource [C]. In Proceedings of International Conference on Natural Language Processing and Knowledge Engineering (pp.820-824). Los Alamitos, CA:IEEE Press.
    [103]Veale, T. (2005). Analogy as functional recategorization:Abstraction with HowNet semantics [C]. In Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP-05). Lecture Notes in Computer Science,3651,326-333.
    [104]Yan, J., Bracewell, D.B., Ren, F.,& Kuroiwa, S. (2007). Semi-automatic construction of an emotion ontology using HowNet [J]. Artificial Intelligence and Pattern Recognition, 17-21.
    [105]Zhu, H., Zheng, D., et al. (2008). Research on query translation disambiguation for CLIR based on HowNet [C]. In Proceedings of the Ninth International Conference for Young Computer Scientists (ICYCS) (1677-1682). Los Alamitos, CA:IEEE Press.
    [106]Wanxiang Che, Z.L., Hu,Y., Li,Y, Qin, B., Liu, T.,& Li, S. (2008). A cascaded syntactic and semantic dependency parsing system [C]. In Proceedings of the Twelfth Conference on Computational Natural Language Learning (CONLL-2008) (pp.238-242). East Stroudsburg, PA:Association for Computational Linguistics.
    [107]Wilson, T.,Wiebe, J.,&Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis [C]. In Proceedings of the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP'05). East Stroudsburg, PA:Association for Computational Linguistics. Retrieved August 28,2009, from http://www.cs.pitt.edu/-wiebe/pubs/papers/emnlp05polarity.pdf
    [108]Salton, G, Wong, A.,& Yang, C.S. (1975). A vector space model for automatic indexing [J]. Communications of the ACM,18,613-620.
    [109]Matsuo, Y.,& Ishizuka, M. (2004). Keyword extraction from a single document using word co-occurrence statistical information [J]. International Journal on Artificial Intelligence Tools,13(1),157-169.
    [110]Joachims, T. (1998). Text categorization with suport vector machines:learning with many relevant features [C]. In Proceedings of the Tenth European Conference on Machine Learning (ECML). Lecture Notes in Computer Science,1398,137-142.
    [111]Joachims, T. (1999). Making large-scale support vector machine learning practical.Advances in kernel methods:Support vector machines [M] (pp.169-184). Cambridge, MA:MIT Press.
    [112]Lewis, D.D. (1998). Naive (Bayes) at forty:The independence assumption in information retrieval [C]. In Proceedings of the Tenth European Conference on Machine Learning (ECML). Lecture Notes in Computer Science,1398,4-15.
    [113]McCallum,A.,&Nigam, K. (1998).A comparison of event models for Naive Bayes text classification [C]. In Proc. AAAI-98 Workshop on Learning for Text Categorization, pages 41-48.
    [114]Quinlan, J.R. (1996). Learning decision tree classifiers [J]. ACM Computer Survey, 28(1),71-72.
    [115]Mitchell, T. (1997). Decision tree learning [M]. NewYork:McGraw-Hill.
    [116]Masuyama, T.,& Nakagawa, H. (2002). Applying cascaded feature selection to SVM text categorization [J]. In Proceedings of the 13th International Workshop on Database and Expert Systems Applications (DEXA'02). Lecture Notes in Computer Science, 2453,241-245.
    [117]Wang, Y., Hodges, J.,& Tang, B. (2003). Classification of web documents using a Naive Bayes method [C]. In Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03) (pp.560-564). Los Alamitos, CA:IEEE Press.
    [118]Dai, W., Xue, G-R., Yang, Q.,& Yu, Y. (2007). Transferring Naive Bayes classifiers for text classification [C]. In R. C. Holte & A. Howe (Chairs), Proceedings of the 22nd Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (pp.540-545). Menlo Park, CA:AAAI Press.
    [119]Sharma, A.,& Jain, R. (2007). Applying decision tree for automatic classification of agricultural web documents [C]. In Proceedings of the Third Indian International Conference on Artificial Intelligence (IICAI-07) (pp.1525-1532). London: Springer-Verlag.
    [120]Yang, Y. and Liu, X.1999. A re-examination of text categorization methods [C]. In Proceedings of SIGIR-99,22nd ACM International Conference on Research and Development in Information Retrieval (Berkeley, US,1999), pp.42-49.
    [121]Dumais, S.T.,&Chen, H. (2000). Hierarchical classification of Web content [C]. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000) (pp.256-263). NewYork:ACM Press.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700