Web文本观点挖掘及隐含情感倾向的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
所谓观点,是指一个人对某些事物的想法和理解,它是对某些事物的判断和评价。观点并非是事实,因为观点既没有得到验证,也没有得到证明和确认。如果一个观点后来能够得到证明和确认,那它就不再是一个观点,而变成一个事实。因此,从一个Web访问者的角度来看,将所有发布在Web上的信息看成是观点比看成是事实更加妥当。了解其他人的想法和对事物的判断已经成为决策制定过程中最重要的依据之一。如今,互联网使一切成为可能,我们能够在互联网上了解那些并不认识的人和专家的观点和态度。同时,越来越多的人也在互联网上分享自己的感受和经历。随着网络上观点资源的日益丰富,如个人博客,在线评论等,给我们提供了新的机会和挑战,如何使用信息技术去挖掘和理解其他人的观点便是观点挖掘。
     情感倾向分析是对Web上用户主动发布的内容(也称作用户生成内容)进行有效的分析和挖掘,识别出这些内容的情感趋势——赞同、反对、高兴或者悲伤,甚至进一步预测情感随时间的演化规律。通过对用户生成内容的情感倾向分析,使我们能够更好地了解用户的消费习惯,分析当下热点事件的舆情,帮助企业和政府作出合理正确的决策。
     然而,目前被广泛使用的信息检索技术,尤其是搜索引擎技术,是以关键字为基础的,无法实现基于情感和观点的检索。其原因有两方面:第一,情感或者观点无法用简单的关键字来表示和索引。第二,信息检索领域的排序策略也并不适合观点挖掘。
     目前,大多数的情感分析算法是需要靠我们用简单的术语来表达我们对产品和服务的情感。然而,文化因素,语言的细微差别和不同的上下文使其很难成为一个简单的赞成或是反对情感的书面文本字符串。因此,本文首先深入研究了情感倾向评估模型和Web文本特征抽取方法,提出了连续性情感评估模型和基于中文依赖语法的情感评估模型。在此基础上,为了挖掘Web文本的主题社区和情感趋势,本文将隐含情感倾向评估模型分别与Web文本社区挖掘算法和文本聚类方法K-Means算法相结合,提出了Web文本社区快速挖掘算法、基于多Agent的Web文本社区挖掘算法和基于隐含情感的Web文本聚类算法。本文主要工作如下:
     (1)在Web文本空间向量模型基础上,提出了一个基于中文依赖语法的主观字特征抽取方法。该方法能够在尽量避免噪音的情况下,依据中文依赖语法规则,抽取出文本表达中的主观字。实验分别在不同的特征向量空间和样本数量不平衡的情况下,对IG、MI、CE和我们的算法在KNN分类器下的表现进行了比较。
     (2)针对离散情感倾向评估方法无法准确描述情感变化趋势的问题,提出了两个中文连续情感倾向评估模型,分别是中文连续情感评估模型和基于中文依赖语法的情感评估模型。中文连续情感评估模型旨在提出一个全面、准确的中文情感倾向分析模型。本文的方法首先识别出句子中出现的情感字,通过上下文的句法结构来判别出每个句子的情感倾向,然后通过整合所有句子的情感倾向来预测整篇文档的情感倾向。实验证明,该方法可以准确地描绘出一定时间段内的Web文本情感的变化趋势。基于中文依赖语法的情感评估模型,通过中文依赖语法规则来判别主观字的先验极性和修饰极性的方法。实验证明,在真实Web数据上,该方法比传统的SVM和NB算法的情感分类结果准确性更高。
     (3)研究了Web文本社区挖掘算法。基于不同的Web社区结构,即静态社区和动态社区,分别提出了基于隐含情感的Web文本社区快速挖掘算法和基于多Agent的Web文本社区挖掘算法。基于多Agent的Web文本社区挖掘算法是一个动态社区挖掘算法,该算法可以在未知Web文本社区结构的情况下,有效地挖掘相同主题和相同情感的Web文本社区。以上两种算法的共同特点是在Web文本社区挖掘算法中,考虑了隐含情感因素,实验结果表明,这两种算法不仅能够提高Web文本挖掘算法的精度值,同时可以提高算法的回召值。
     (4)改进了经典的文本聚类方法K-Means算法,提出了一个基于隐含情感的Web文本聚类算法,算法中给出了一个基于隐含情感和文本特征的相似性比较算法,同时算法基于一个新的分级机制的原始中心选择算法。因为一个好的原始中心不仅仅能够代表文本聚类的中心,同时可以更好的区分该中心与其他中心。通过实验验证,在不同类型的在线文本集上,K-Means算法、Bisecting K-Means算法、UPGMA算法和本文提出的HSK-Means算法想比较,具有原始中心选择的算法(如bisecting K-Means和HSK-Means算法)的表现明显优于不具有原始中心选择的文本聚类算法。
     综上所述,本文深入研究了Web文本观点挖掘和中文文本隐含情感倾向分析问题,主要考虑了如何更加准确地评估文本中隐含情感倾向,即连续情感倾向评估问题;同时,分别对静态和动态的Web文本社区挖掘给出的两个不同算法,最后给出了一个基于隐含情感和原始中心选择的Web文本聚类算法。将隐含情感分析和社区挖掘相结合,不仅仅可以更加准确的、全面的了解观点持有者表达的真正想法,同时可以帮助需要使用和借鉴这些观点的人作出正确的决策。本文的算法研究和实现方法都非常新颖,且具有较高理论价值和实际应用价值。本文对观点挖掘和情感分析领域进一步研究具有重要意义。
The opinions mean someone’s ideas and understanding about something, they are something’s judgment and evaluation. The opinions are not the facts, because the opinions are not verified, unproved and confirmed. If later an opinion could be proved and confirmed, it is no longer an opinion, is becomes a fact. So from the views of a Web’s visitor’s it is more suitable to take all the information published on the web as opinions rather than facts. Knowing others’opinions has become the most important part of decision-making procedures. Now the Internet makes everything possible, we could get to know others and experts’opinions and attitudes even though we are not familiar with them. At the same time, more and more persons share their feelings and experiences on the internet. The abundant opinions resources on the internet such as personal blogs, online comments bring new opportunities and challenges. How to dig and understand others’opinions using information technology are opinions mining.
     Sentiment inclination analysis is to effectively analyze and mine the users’actively published contents, also called user generated contents on the web, to identify the contents’sentiment inclination, e.g. positive、negative、happy or sad, even to predict the trend of sentiment over time. By analyzing the sentiment inclination of the user generated contents, we could better understand the users’consuming habits, analyze the comments and responses of the current hot affairs and assist the enterprises and governments in making the reasonable and right decisions.
     But the current most-used information technology, especially the search engine technology is based on the keywords, could not search based on the sentiment and opinions. There are two reasons, firstly the sentiment and opinions could not be expressed and indexed by simple keywords, secondly the index strategy of the information search fields is not suitable for opinions.
     Now the problem of most sentiment analyzing algorithms is that we have to use simple terminology to express our sentiments about products and services. However, the culture factors, the subtle differences of the languages and the different contexts make it difficult to simply label a favorite or objective sentiment. So, firstly our paper deeply researched the sentiment inclination evaluation model and web text features extraction methods. We proposed continuous sentiment evaluation model and sentiment evaluation model based on the Chinese dependency grammar. On this basis, our paper combined hidden sentiment inclination evaluation model with the web text community mining algorithm and text clustering methods K-Means algorithms respectively in order to mine the web texts’topic community and sentiment trends, proposed web text community fast mining algorithm, web text community dynamic mining algorithm based on multi-agent and web text clustering algorithm based on hidden sentiment, our paper’s mainly focuses are followings:
     (1) We proposed a features extraction method of subjective words using the Chinese dependency grammar based on web text space vector model. This method could extract the subjective words of the expressed texts following the Chinese dependency grammar rules while avoiding noises possibly. The experiment compared the performances of the IG、MI、CE and our algorithms under the KNN classifiers while using different feature vector spaces and unbalanced sample counts.
     (2) Aimed at the method of discrete sentiment inclination evaluation can not accurately describe the trend of sentiment, proposed two Chinese continuous sentiment inclination evaluation model:Chinese continuous sentiment evaluation model and sentiment evaluation model base the Chinese dependency grammar. The goal of Chinese continuous sentiment evaluation model is to propose a comprehensive and accurate sentiment inclination analysis method. This method identified the sentiment words of the sentences, judged every sentence’s sentiment inclination through the context’s sentence structure, and then combined all the sentences’sentiment inclination to predict the sentiment inclination of the whole documents. The experiment results showed that our method could accurately describe the web texts’sentiment trends in a specified period. The sentiment evaluation model based on Chinese dependency grammar is to judge prior polarity and modified polarity of the subjective words using the Chinese dependency grammar rules. Experiments showed that on the real Web data, the accuracy of our method’sentiment classification is higher than the traditional SVM and NB algorithm.
     (3) We researched web text community mining algorithm. For the different web community structures, those are static communities and dynamic communities our paper proposed web text community fast mining method based on hidden sentiment and web text community dynamic mining algorithm based on multi-agent respectively. Web text community dynamic mining algorithms could effectively mine the web text community of the same topics and the same sentiments while not knowing the web text community structures. The above two methods’common feature is that they all take count of the hidden sentiment factors in the web text community mining algorithms. The experiment results showed that these two algorithms could not only improve the accuracy of web text mining algorithm, but also improve the recall of the algorithm
     (4) We improved the classic text clustering algorithm K-Means, proposed a web text clustering algorithm based on hidden sentiments, this algorithm contained a similarity compared algorithm based on the hidden sentiment and text features, also proposed an original center selection algorithm base on a new classification mechanism. A good original center could represent the center of the text clustering and meanwhile distinguish this center from others centers better. The experiments validated that , using the online text sets of different types, compared the K-Means algorithm、Bisecting K-Means algorithm、UPGMA algorithm and the HSK-Means algorithm proposed in this paper, the text clustering algorithm with original center selection(e.g. Bisecting K-Means algorithm and HSK-Means) performed significantly better than the algorithm without original center selection.
     Above all, this paper deeply researched the web text topic mining and Chinese text hidden sentiment inclination analysis, mainly focused on how to evaluate the hidden sentiment inclination of the texts more accurately, that is continuous sentiment inclination evaluation, meanwhile, we proposed static and dynamic community of web text mining algorithms respectively. Finally, we given a web text clustering algorithm based on hidden sentiment and original center selection. Combining hidden sentiment analysis and community mining, not only can be more accurate, comprehensive understanding of the real views of opinions’holder, but help to use and learn from these opinions of people make the right decisions. This algorithm research and implementation methods are very novel and has a high theoretical value and practical value. So, this thesis is of great significance to the further research of opinion mining and sentiment analysis.
引文
[1] K. Dave, S. Lawrence, and D.M. Pennock. Mining the peanut gallery: Opinion extraction and sentiment classification of product reviews[C]. Proceedings of WWW, 2003, 519-528.
    [2] Cordis. Collective emotions in cyberspace (CYBEREMOTIONS)[C]. Supported by European Commission, 7th FWP, 2009.
    [3] J. Condliffe. Flaming drives online social networks[J]. NewScientist, 2010, 5459-6268.
    [4] P. Greenberg. Time (and CRM2.0) Marches on: CRM Intergration w / Socail Platforms Expands. ZDNet.com Retrieved April 12, 2009 from http://blogs.zdnet.com.
    [5] Michelle de Haaff. Sentiment Analysis, Hard but Worth it, CustomerThink, http://www.customerthink.com/blog/sentiment_analysis_hard_but_worth_it, 2010.
    [6] Peter Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews[C]. Proceedings of the Association for Computational Linguistics. 2002, 417–424.
    [7] Pang Bo, Lee Lillian. Subjectivity Detection and Opinion Identification Opinion Mining and Sentiment Analysis. Now Publishers Inc. http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey, 2008.
    [8] Bo Pang; Lillian Lee. Using very simple statistics for review search: An Exploration[C]. Proceedings of the International Conference on Computational Linguistics, 2008, 58-69.
    [9] Benjamin Snyder; Regina Barzilay. Multiple Aspect Ranking using the Good Grief Algorithm[C]. Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL). 2007, 300–307, http://people.csail.mit.edu/regina/my_papers/ggranker.ps.
    [10] Rada Mihalcea; Carmen Banea and Janyce. Learning Multilingual Subjective Language via Cross-Lingual Projections[C]. Proceedings of the Association for Computational Linguistics (ACL). 2007, 976–983.
    [11] Fangzhong Su; Katja Markert. From Words to Senses: a Case Study in Subjectivity Recognition[C]. Proceedings of Coling 2008, Manchester, UK. http://www.comp.leeds.ac.uk/markert/Papers/Coling2008.pdf.
    [12] Bo Pang; Lillian Lee. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]. Proceedings of the Association for Computational Linguistics (ACL). 2004, 271–278.
    [13] Minqing Hu; Bing Liu. Mining and Summarizing Customer Reviews[C]. http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html. Proceedings of KDD 2004.
    [14] Bing Liu; Minqing Hu and Junsheng Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web[C]. Proceedings of WWW 2005. http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html.
    [15] Bing Liu. Sentiment Analysis and Subjectivity[J]. Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau), 2010, http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html.
    [16] A.Abbasi. Affect intensity analysis of dark web forums[C]. Proceedings of Intelligence and Security Informatics(ISI), 2007, 282-288.
    [17] J. Tatemura. Virtual reviewers for collaborative exploration of movie reviews[C]. Proceedings of Intelligent User Interfaces(IUI),2000, 272-275.
    [18] L. Terveen, W. Hill, B. Amento, D. Mcdonald, and J. Creter. PHOAKS: A system for sharing recommendations[J].Communications of the Association for Computing Machinery(CACM), 1997 (40), 59-62.
    [19] M.Kantrowitz. Method and apparatus for analyzing affect and emotion in text. U.S.Patent 6622140, Patent filed in November 2000, 203-218.
    [20] W.Sack. On the computation of point of view[C]. Proceedings of AAAI,1994, 1488.
    [21] J.M.Wiebe. Identifying subjective characters in narrative[C]. Proceedings of theInternational Conference on Computational Linguistics(COLING),1990, 401-408.
    [22] J. M. Wiebe, R.F. Bruce, and T.P.O’Hara. Development and use of a gold standard data set of for subjectivity classifications[C]. Proceedings of the Association for Computational Linguistics(ACL), 1999, 246-253.
    [23] E.Spertus. Smokey: Automatic recognition of hostile messages[C]. Proceedings of Innovative Applications of artificial Intelligence (IAAI),1997, 1058-1065.
    [24] X. Jin, Y. Li, T. Mah, and J. Tong. Sensitive webpage classification for content advertising[C]. Proceedings of the International Workshop onData Mining and Audience Intelligence for Advertising, 2007.
    [25] E.Riloff and J. Wiebe. Learning extraction patterns for subjective expressions[C]. Proceedings of EMNLP, 2003.
    [26] S.Somasundaran, T.Wilson, J. Wiebe, and V.Stoyanov. QA with attitude: Exploiting opinion type analysis for improving question answering in on-line discussions and the news[C]. Proceedings of the International Conference on Weblogs and Social MEDIA(ICWSM), 2007.
    [27] V.Stoyanov, C.Cardie, and J. Wiebe. Multi-perspective question answering using the OpQA corpus[C]. Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing(HLT/EMNLP), Vancouver, British Columbia Canada:Association for Computational Linguistics, October 2005, 923-930.
    [28] L.V. Lita, A.H. Schlaikjer, W.Hong, and E.Nyberg. Qualitative dimensions in question answering:Extending the definitional QA task[C]. Proceedings of AAAI, 2005, 1616-1617.
    [29] J.Liscombe, G. Riccardi, and D. Hakkani-Tur. Using context to improve emotion detection in spoken dialog systems[J]. Interspeech, 2005, 1845-1848.
    [30] H.Liu, H.Liebeman, and T.Selker. A model of textual affect sensing using real-world knowledge[C]. Proceedings of Intelligent User Interfaces(IUI), 2003, 125-132.
    [31] R.Tokuhisa and R. Terashima. Relationship between utterances and enthusiasm in non-task-oriented conversational dialogue[C]. Proceedings of SIGdialWorkshop on Discourse and Dialogue, Sydney, Australia:Association for Computational Linguistics, July 2006, 161-167.
    [32] V. Hatzivassiloglou and J. Wiebe. Effects of adjective orientation and gradability on sentence subjectivity[C]. Proceedings of the International Conference on Computational Linguistics(COLING), 2000.
    [33] J.M. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Marin. Learning subjective language[J]. Computational Linguistics, September 2004(30), 277-308.
    [34] R. Mihalcea, C. Banea, and J. Wiebe. Learning multilingual subjective language via cross-lingual projection[C]. Proceedings of the Association for Computational Linguistics(ACL), Prague, Czech Republic, June 2007, 976-983.
    [35] T. Wilson, J. Wiebe, and R. Hwa. Just how mand are you? Finding strong and weak opinion clauses[C]. Proceedings of AAAI, 2004, 761-769.
    [36] H. Yu and V. Hatzivassiloglou. Towards answering opinion questions:Separating facts from opinions and indentifying the polarity of opinion sentence[C]. Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP), 2003.
    [37] I. Ounis, M. de Rijke, C. Macdonald, G. Mishne, and I. Soboroff. Overview of the TREC-2006 blog track[C]. Proceedings of the 15th Text Retrieval Conference(TREC), 2006.
    [38] K. Dave, S. Lawrence, and D.M. Pennock. Mining the peanut gallery : Opinion extraction and senmatic classification of product reviews[C]. Proceedings of WWW, 2003, 519-528.
    [39] A. Finn and N. Kushmerick. Learning to classify documents according to genre[J]. Journal of the American Society for Information Science and Technology(JASIST), 2006(7) 58-65.
    [40] V. Ng, S. Dasgupta, and S.M.N.Arifin. Examing the role of linguistic knowledge sources in the automatic identification and classification of reviews[C]. Proceedings of the COLING/ACL Main Conference Poster Sessions, Sydney, Australia:Association for Computational Linguistics, July 2006, 611-618.
    [41] Wiebe J, Riloff E. Creating subjective and objective sentence classifiers formunannotated texts[C]. Proceedings of CICLing, 2005, 486-497.
    [42] Nasukawa T, Yi J. Sentiment analysis: capturing favorability using natural language processing[C]. Proceedings of K-CAP’03: Proceedings of the 2nd international conference on Knowledge capture, New York, NY, USA:ACM, 2003, 70-77 .
    [43] Gamon M. Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis[C]. Proceedings of COLING’04: the 20th international conference on Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 2004, 841-845.
    [44] Li J, Sun M. Experimental Study on Sentiment classification of Chinese Review using Machine Learning Techniques[C]. Proceedings of IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2007, 803-816.
    [45] Ifrim G, Weikum G. Transductive Learning for Text Classification Using Explicit Knowledge Model[C]. Proceedings of PKDD, 2006, 223-234.
    [46]苏祺.面向问答系统的情感倾向分析研究[D].北京大学,2006.
    [47] S Sood S O, Hammond K. Reasoning through search: A novel appproch to sentiment classification[C]. Proceedings of WWW2007, 2007 535-547.
    [48]朱嫣岚.基于HowNet的词汇语义倾向计算[J].中文信息学报,第20卷第11期. 2006.
    [49] Jindal N, Liu B. Mining Comprative Sentences and Relations[C]. Proceedings of 21st National Conference on Artificial Intelligence (AAAI-2006), Boston, Massachusetts, USA 2006.
    [50] Zhang D, Lee W S. Extracing key-substring-group features for text classification[C]. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining(KDD’06). ACM Press, 2006, 474-483.
    [51] Prem Melville, Raymond J. Mooney, and Ramadass Nagarajan. Content-Boosted Collaborative Filtering for Improved Recommendations[C]. Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-2002), Canada Edmonton, July 2002, 187-192.
    [52] Choi Y, Cardie C, Riloff E, et al. Identifying sources of opinions with conditional random fields and extraction patterns[C]. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristwn, NJ, USA:Association for Computational Linguistics 2005, 355-362.
    [53] Frost, R. Hafiz, R. and Callaghan, P. Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars[C]. Proceeedings of 10th International Workshop on Parsing Techonologies(IWPT), ACL-SIGPARSE, June 2007, 109-120.
    [54] Muegge. Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study[C]. Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, London,16–17 November 2006, 854-872.
    [55] Sunil Khanal. Sentiment Classification using Language Models and Sentence Position Information[C]. Proceedings of SIGIR, 2010, 747-755.
    [56] Qiang Ye, Ziqiong Zhang, Rob Law. Sentiment classification of online reviews to travel destinations by supervised machine learning approaches[J]. Expert Syst. 2009, 36(3): 6527-6535.
    [57] T. Zagibalov and J. Carroll. Unsupervised Classification of sentiment and Objectivity in Chinese Text[C]. Proceedings of the Third International Joint Conference on Natural Language Processing IJCNLP, 2008, 4726-4735.
    [58] Riloff E, Wiebe J, Phillips W. Exploiting subjectivity classification to improve information extraction[C]. Proceedings of the AAAI 2005. Menlo Park:AAAI Press, 2005. 1106-1111.
    [59] Kim SM, Hovy E. Automatic detection of opinion bearing words and sentences[C]. Proceedings of the IJCNLP 2005. Morristown: ACI, 2005. 61-66.
    [60] Hatzivassiloglou V, Wiebe J. Effects of adjective orientation and gradability on sentence subjectivity[C]. Proceedings of the Int’1 Conf. on Computational Linguistics. Morristown: ACL 2000. 299-305.
    [61] Riloff E, Wiebe J., and Wilson T. Learning subjective nouns using extraction pattern bootstrapping[C]. Proceedings of the Conference on Natural LanguageLearning, 2003. 25-32.
    [62] E. Breck, Y. Choi, and C. Cardie. Identifying expressions of opinion in context[C]. Proceedings of the International Joint Conference on Artificial Intelligence(IJCAI), Hyderabad, India, 2007. 599-615.
    [63] G.Carenini, R. Ng, and A. Pauls.Multi-document summarization of evaluative text, proceedings of the European Chapter of the Association for Computational Linguistics(EACL), 2006. 305-312.
    [64] Yao TF, Peng SW. A study of the classification approach for Chinese subjective and objective texts[C]. Proceeding of the NCIRCS 2007. 117-123.
    [65] Y. Choi, E. Breck, and C. Cardie, Joint extraction of entities and relations for opinion recognition, Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP), 2006. 158-169.
    [66] Turney P. Thumbs up or thumbs down? Sentmantic orientation applied to unsupervised classification of reviews[C]. Proceedings of the 40th ACL, 2002, 417-424.
    [67] Kudo T. and Matsumoto Y. A Boosting algorithm for classification of semi-structured text[C]. Proceedings of 9th EMNLP, 2004, 301-308.
    [68] Zhao J, Liu K, Wang G. Adding redundant features for CRFs-based sentence sentiment classification[C]. Proceedings of the Conf. on Empirical Methods in Natural Language Processing. 2008, 117-126.
    [69] Kim SM, Hovy E. Automatic identification of pro and con reasons in online reviews[C]. Proceedings of the COLING/ACL 2006, 483-490.
    [70] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?Sentiment classification using machine learning techniques[C]. Proceeding of the Conference on Empirical Methods in Natural Language Processing(EMNLP), 2002, 79-86.
    [71] Cui H, Mittal VO, Datar M. Comparative experiments on sentiment classification for online product revies[C]. Proceedings of the AAAI 2006. 2006, 1265-1270 .
    [72] S. Matsumoto, H. Takamura, and M. Okumura, Sentiment classification using word sub-sequences and dependency sub-trees[C]. Proceedings of PAKSS’05, the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and DataMining, 2005.
    [73] Goldberg AB, Zhu X. Seeing stars when there aren’t many stars: Graph-Based semi=supervised learning for semtiment categorization[C]. Proceedings of the HLT-NAACL 2006 Workshop on Textgraphs: Graph-Based Algorithms for Natural Language Processing. 2006, 45-52 .
    [74] G.W.Flake, S.Lawrence, C.L. Giles, and F.M. Coetzee. Self-organization and Identification of Web Communities[J]. Computer, 2002,35(3), 66-71.
    [75] S.H. Straogatz. Exploring Complex Networks[J]. Nature, 2001, 410, 268-276.
    [76] D.J. Watts and S.H. Strogatz. Collective Dynamics of Small World Networks[J]. Nature, 1998(393), 440-442.
    [77] Girvan M, Newman M E J. Community structure in social and biological networks[C]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(12):7821-7826.
    [78] Kernighan B W, Lin S. A efficient heuristic procedure for partitioning graphs[J]. Bell System Technical Journal, 1970, 49(2), 291-307.
    [79] Sergey Brin, Lawrence Page. PageRank: Bringing Order to the Web. Stanford Digital Library Project, talk. August 18, 1998.
    [80] J. Kleinberg, Authoritative Sources in a Hyperlinked Environment[C]. Proceedings 9th ACM-SIAM SODA, 1998, 352-347.
    [81] C.J. van Rijsbergen, Information Retrieval, second ed.., Buttersworth, London, 1979.
    [82] G. Salton, M. Lesk. Computer evaluation of indexing and text processing[J].Journal of the ACM, 1968, 15(1), 8-36.
    [83] Salton. G., Buckley. C.. Term-weighting approaches in automatic text retrieval. Information Processing and Management[J]. 1998, 24(5), 513-523.
    [84] P.B.Baxendale. Man-made index for technical literature an experiment[J]. IBM J. Res, Develop, 1958, 2(1958), 354-361.
    [85] Frank E, Paynter G W, Witten I H, et al. Domain-specific keyphrase extraction[C]. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence(IJCAI-99). California: Morgan Kaufmann, 1999, 668-673.
    [86] Rao, D., Ravichandran D. Semi-supervised polarity lexicon induction[C]. Proceedings of the EACL 2009. Morristown: ACL, 2009, 675-682.
    [87] A. Andreevskaia and S. Bergler. Mining WordNet for a fuzzy sentiment: Sentiment tag extraction from WordNet glosses[C]. Proceedings of the European Chapter of the Association for Computational Linguistics(EACL), 2006.
    [88] Wiebe J. Learning subjective adjctives from corpora[C]. Proceedings of the AAAI. Menlo Park: AAAI Press, 2000, 735-740.
    [89] E. Riloff, S. Patwardhan, and J.Wiebe. Feature subsumption for opinion analysis[C]. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2006. 274-286.
    [90] Kim SM, Hovy E. Identifying and analyzing judgment opinions[C]. Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conf. (HLT-NAACL). Morristown: ACL , 2006, 200-207.
    [91] Esuli A, Sebastiani F. Determing term subjectivity and term orientation for opinion mining[C]. Proceedings of the European Chapter lf the Association for Computational Linguistics(EACL). Morristown: ACL, 2006, 193-200.
    [92] Kamps J, Marx M, Mokken RJ. Using WordNet to measure semantic orientation of adjectives[C]. Proceedings of the LREC. 2004, 1115-1118.
    [93] Takamura H, Inui T, Okumura M. Extracting semantic orientation of words using spin model[C]. Proceedings of the Association for Computational Linguistics(ACL). Morristown: ACL,2005, 133-140.
    [94] Wiebe J, Wilson T, Bell M. Identifying collocations for recognizing opinions[C]. Proceedings of the ACL/EACL Workshop on Collocation: Computational Extraction, Analysis, and Exploitation. Morristown:ACL, 2001, 24-31.
    [95] Wiebe J, Wilson T, Learning to disambiguate potentially subjective expressions[C]. Proceedings of the Conf. on Natural Language Learning(CoNLL). Morristown: ACL, 2002, 112-118.
    [96] Wilson T, Wiebe J, Hwa R. Just how mad are you? Finding strong and weak opinion clauses[C]. Proceedings of the AAAI 2004. Menlo Park:AAAI Press 2004, 761-769.
    [97] Wilson T, Wiebe J, Hwa R. Recognizing strong and weak opinion clauses[J]. Computational Intelligence, 2006, 22(2):73-99.
    [98] Whitelaw C, Garg N, Argamon S. Using appraisal groups for sentiment analysis[C]. Proceedings of the ACM SIGIR Conf. on Information and Knowledge Management (CIKM).New York: ACM Press, 2005, 625-631.
    [99] Yi J, Nasukawa T, Bunescu R. Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques[C]. Proceedings of the IEEE Int’1 Conf. on Data Ming(ICDM). 2003,427-434.
    [100] Hu M, Liu B. Mining opinion feature in customer reviews[C]. Proceedings of the AAAI 2004. Menlo Park: AAAI Press, 2004,755-760.
    [101] Ni MS, Lin HF. Mining product reviews based on association rule and polar analysis[C]. Proceedings of the NCIRCS 2007. 2007,628-634.
    [102] Liu HY, Zhao YY, Qin B, Liu T. Target extraction and sentiment classification[J]. Jounal of Chinese Information Processing, 2010, 24(1):84-88.
    [103] Popescu AM, Etzioni O. Extracting product features and opinion from reviews[C]. Proceedings of the HLT/EMNLP. Morristown:ACL, 2005, 339-346.
    [104] Blei DM, Ng AY, Jordan MI.Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 993-1022.
    [105] Blei DM, Ng AY, Jordan MI. Correlated topic models[J].. In: Scholkopf B, ed. Advances in NIPS. Hyatt Regency: MIT Press, 2006, 147-154.
    [106] Titov I, McDonald R. Modeling online reviews with multi-grain topic models[C]. Proceedings of the WWW 2008 New York:ACM Press. 2008, 111-120
    [107] Bing Liu, Xiaoli Li, Wee Sun Lee, Philip S. Yu. Text classification by labeling words[C]. Proceedings of the 19th national conference on Artifical Intelligence, AAAI’04, 2004, 55-65.
    [108] Douglas Biber, Susan Conrad, and Randi Reppen. Corpus Linguistics: Investigating Language Structure and Use[J]. Cambridge University Press, 1998.
    [109] B. K. Y. Tsou, R. W.M.Yuen, O.Y.Kwong, T. B. Y. La, W.L. Wong. Polarity Classification of celebrity coverage in the Chinese Press[C]. Proceedings of International Conference on Intelligence Analysis , 2005, 316-325.
    [110] W. H. Lin, T. Wilson, J. Wiebe and A. Hauptmann. Which side are you on?Identifying perspectives at the document and sentence levels[C]. Proceedings of the Conference on Natural Language Learning, 2006.
    [111] Arun Meena, T. V. Prabhakar. Sentence Level Sentiment Analysis in the Presence of Conjuncts Using Linguistic Analysis[C]. Proceedings of ECIR 2007, 2007,573-580.
    [112] Y. Mao and G. Lebanon. Isotonic conditional random fields and local sentiment flow[C]. Proceedings of Advances in Neural Information Processing Systems, 2007.
    [113] B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales[C]. Proceedings of the Association for Computational Linguistics(ACL), 2005, 15-124.
    [114] R. McDonald, K. Hannan, T. Neylon, M. Wells, and J. Reynar. Structured model for fine-to–coarse sentiment analysis[C]. Proceedings of the Association for Computational Linguistics(ACL), Prague, Czech Republic: Association for Computional Linguistics, June 2007, 432-439.
    [115] H. Takamura, T. Inui, and M. Okumura. Extracting semantic orientations of phrases from dictionary[C]. Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference, 2007.
    [116] Yuen, Raymong,Chan, Terence, Lai, Tom, Kwong, O.Y., and Tsou Benjamin. Morpheme-based derivation of bipolar semantic orientation of Chinese words[C]. Proceedings of 20th International Conference on Chinese Linguistics(COLING), 2004, 1008-1014.
    [117] J.Kamps, M. Marx, R.J. Mokken, and M. de Rijke. Using WordNet to measure semantic orientation of adjectives[C]. Proceedings of LREC, 2004, 254-269.
    [118] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack. Sentiment analyzer: Extracting sentiment about a given topic using natural language processing techniques[C]. Proceedings of the IEEE International Conference on Data Mining(ICDM), 2003,5478-5491.
    [119] A.M. Popescu and O. Etzioni. Extracting product features and opinions fromreviews[C]. Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing(HLT/EMNLP), 2005.
    [120] M. Hu and B. Liu. Mining and summarizing customer reviews[C]. Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2004, 168-177.
    [121] A. Kennedy and D. Inkpen. Sentiment classification of movie reviews using contextual valence shifter[J]. Computational Intelligence, 2006,vol. 22, 110-125.
    [122] C. Alper and S. Yao. Spectral partitioning: the more eigenvectors, the better[C]. ACM/IEEE Design Automation Conf, 1994, 195-200.
    [123] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks[J]. Physical Rev. E, 2004(69), 02113-1~15.
    [124] M. Rosvall and C.T. Bergstrom. An information-theoretic framework for resolving community structure in complex networks[C]. Proceedings of PNAS, 2007, 104(18):7327-7331.
    [125] D. Chakrabarti. Autopart: Parameter-free graph partition and outlier detection[C]. Proceedings of EKDD, 1994, 195-200.
    [126] R. Guimera, L.A.N. Amaral. Functional cartography of complex metabolic networks[J]. Nature, 2005(433), 895-900.
    [127] B. He, T. Tao, K. C.-C. Chang. Organizing Structured Web Sources by Query Schemas: A Clustering Approach [C]. Proceedings of the thirteenth ACM international conference on Information and knowledge management, 2004, 22-31.
    [128] D Van, A Engelbrecht. Training product unit networks using cooperative particle swarm optimizers[C]. Proceedings Of the third Genetic and Evolutionary Computation Conference, 2001, 84-90.
    [129] Ricardo Bagnasco and Joan Serrat. Multi-agent Reinforcement Learning in Network Management[C]. Proceedings of the 3rd International Conference on Autonomous Infrastructure, Management and Security: Scalability of Networks and Service 2009 , 199-202.
    [130] H.M. Singer, I.Singer and H. J. Herrmann. Agent-based model for friendship in social networks[J]. Physical Rev. E, 2009(80), 234-248.
    [131] H. Zhuge. Communities and emerging semantics in semantic link network: Discovery and learning[J]. IEEE Transactions on Knowledge and Data Engineering 2009, 21(6),785-799.
    [132] H. Zhuge, J. Zhang. Topological centrality and its applications, CoRR abs/0902. 1911, 2009.
    [133] B. Aleman-Meza, M.Nagarajan, L. Ding, A. Sheth, A. Joshi, T. Finin, Scalable semantic analytics on social networks for addressing the problem of conflict of interest detection[J]. ACM Transactions on Web 2008, 2 (1) 1-29.
    [134] P.Berkhin. Survey of Clustering Data Mining Techniques, Accrue software, http://citeseer.nj.nec.com/berkhin02survey.html, 2004.
    [135] M.Steinbach, G. Karypis, and V.Kumar. A Comparison of Document Clustering Techniques[C]. KDD Workshop on Text Mining 2000.
    [136] Xuan SU, Xiaoye WANG, Zhou WANG, Yingyuan XIAO. An New Fuzzy Clustering Algorithm Based on Entropy Weighting[J]. The Journal of Computional Information Systems, 2010, 6(10), 3319-3326.
    [137] Y. Li, S.M. Chung. Parallel Bisecting K-means with Prediction Clustering Algorithm[J]. The Journal of Supercomputing, 2007 39 (1), 19-37.
    [138] Hu, G., Zhou, S., Guan, J.. Towards effective document clustering : A Constrained K-means based approach44[J]. The Journal of Information Processing and Management, 200844(4), 1397-1409.
    [139] I.Iliopoulos, A.J. Enright, and C.A.Ouzounis. Textquest: Document Clustering of Medline Abstracts for Concept Discovery in Molecular Bioloy[C]. Proceedings of the Sixth Annual Pacific Symposium on Biocomputing (PSB 001), 2001.
    [140] C.J. Van Rijsbergen, Information Retrieval[C]. second ed., Buttersworth London 1979.
    [141] W.Zhong, X. Tang. Web Text Mining on XSSC[J]. Institude of System Science, Academy of Mathematics and System Science.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700