详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
In recent years, with the quick development of media, more and more people began to comment in the forum and express their opinions. The network version of the article and sentence with a personal emotional polarity have appeared with large numbers, which the customer comments on the Internet for purchase decisions of online consumers have an important impact, and how the comment text from the mass customer data automatically extracted valuable Information, has become an urgent problem. This paper uses the methods of traditional text classification to sentiment text classification. Considering use the statistical methods as a solution to solve the problem of sentiment text classification. Combination with the technology of traditional Chinese text classification based on the theme, have a research on the key techniques of Chinese sentiment text classification, focusing on improving the precision of the result of the sentiment text classification. Analysis the influence of different feature selection methods, feature representation methods and different classification Model have on the accuracy of sentiment classification.
     This paper have an research on the key technology of Chinese sentiment text classification and ultimately confirm the effective classification model which proposed an effective feature selection methods, feature representation model and effective sentiment text classifier; constitute four different stop list which based on the feature of sentiment text classification. Analysis the different contribution of the four different stop list to the result of sentiment text classification through some experiment. Finally, this paper confirmed the effective stop word list. Finally, this paper applied the classification model to practical problem and verified the validity of research results. Using text classification model based on SVM for goods recommended, have a classification experiment to classify the product reviews which collected from a well-known shopping site. Extract the effective consumer product reviews characteristics, polarity of the sentiment text. Give the final results a reasonable analysis and put forward some constructive opinions on the application of sentiment text classification.
[1] Chen B, He H, Guo J. Language feature mining for document subjectivity analysis. In Proc. Of the 1st Int.Symp.on Data, Privacy&E-Commerce,2007,62-67.
    [2] Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Communications of the ACM, 18(5),1975,613-620
    [3]汤代禄,韩建俊,边振兴.互联网的变革-Web2.0理念与设计.电子工业出版社, 2007.23-24
    [7]董梅,胡学刚.基于多特征选择的中文文本分类[J].计算机技术与发展, 2007, 17(7):117-119
    [8]马忠宝,刘冠蓉.基于支持向量机的中文文本分类模型研究[J].计算机技术与发展, 2006, 16(11):70-72
    [9] T Usuner, D Godes. Better sales networks. Harvard business review. 2006, 84(7-8): 102-12, 188.
    [10]余传明.从产品评论中挖掘观点:原理与算法分析[J].情报理论与实践. 2009. 7(32):106-109.
    [11] Gamon M, Aue A, Corston-Oliver S, et al. Pulse: Mining customer opinions from free text. In Proc. of the 6th Int. Symp. on Intelligent Data Analysis, 2005, 121-132.
    [12] Mofinaga S, Yamanishi K, Tateishi K, et al. Mining product reputations on the Web. In Proc. of the 8th ACM SIGKDD Int. Conf. On Knowledge Discovery And Data Mining, 2002, 341-349.
    [13] Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In Proc. of the 12th Int. Conf. on World Wide Web, 2003, 519-528.
    [14] Liu B, Hu M, Cheng J. Opinion observer: analyzing and comparing opinions on the web. In Proc. of the 14th Int. Conf. on World Wide Web, 2005, 342-351.
    [15] Pang B, Lee L, Vaithyanathan S. Thumbs up Setiment classification using machine learning techniques[C]. The Conference on Empirical Methods in Natural Language Processing,2002:79-86
    [16] Salvetti F, Lewis S, Reichenbach C. Automatic opinion polarity classification of movie reviews. Colorado Research in Linguistics, 17(1), 2004, 1-15.
    [17] Chesley P, Vincent B, Xu L, et al. Using verbs and adjectives to automatically classify blog sentiment. In Proc. of Computational Approaches to Analyzing Weblogs: Papers from the 2006 Spring Symposium, Nicolov N, Salvetti F, Liberman M, Maartin J H(eds.), AAAI Press, Menlo Park, CA, Technical Report SS-06-03, 2006, 27-29.
    [18] Kennedy A Inkpen D. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22(2), 2006, 110-125.
    [19]刘康,赵军.基于层叠CRFS模型的句子褒贬度分析研究.中文信息学报, 22(1), 2008, 123-128.
    [18] Pang B, Lee L. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proc. of the 42nd Meeting of the Association for Computational Languages, 2004, 271-278.
    [19] Goldberg AB, Zhu X. Seeing stars when there are’t many stars: Graph-based semi-supervised learning for sentiment categorization. In Proc. of HLT-NAACL 2006 Workshop on Textgraphs: Graph-based Algorithms for Natural Language Processing, 2006, 45-52.
    [20] Hu M, Liu B. Mining and summarizing customer reviews. In Proc. of the 10th ACMSIGKDD lnt. Conf. on Knowledge Discovery and Data Mining, 2004, 168-177.
    [21] Hu M, Liu B. Mining opinion features in customer reviews. In Proc. of the 19th National Conf. on Artificial Intelligence(AAAI-2004), 2004, 755-760.
    [22] Lin WH, Wilson T, Wiebe J, et al. Which side are you on? Identifying perspectives at the document and sentence levels. In Proc. of the 10th Conf. on Computational Natural Language Learning, 2006, 109-116.
    [23] Whitelaw C, Garg N, Argamon S. Using appraisal groups for sentiment analysis. In Proc. of the 14th ACM Int. Conf. on Information and Knowledge Management, 2005, 625-631.
    [24] Bruce R, Wiebe J. Recognizing subjectivity: a case study in manual tagging. Natural Language Engineering, 5(2), 1999, 1-16.
    [25] Wiebe J, Riloff E. Creating subjective and objective sentence classifiers from unannotated texts. In Proc. of the 6th Int. Conf. on Computational Linguistics and Intelligent Text Processing, 2005, 486-497.
    [26] Ni X, Xue G, Ling X, et al. Exploring in the Weblog space by detecting informative and affective articles. In Proc. of the 16th Int. Conf. on World Wide Web, 2007, 281-290.
    [27] Bluenight.文本分类概述. 2009.10http://blog.csdn.net/chl033/archive/2009/10/27/4733647.aspx
    [28]应英,周峰,周昌乐.汉语情感意义的机器标注初探.中文信息学报. 2002. 16(2):27-33
    [29]徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类.中文信息学报, 21(6), 2007, 95-100.
    [30]唐慧丰,谭松波,程学旗.基于监督学习的中文情感分类技术比较研究.中文信息学报, 21(6), 2007, 88-94.
    [31]熊德兰,柴玉梅,咎红英.基于内容的名人网页褒贬性评价.平顶山工学院学院.2005.(4): 47-49,67
    [32] Silva C, Ribeiro B. The importance of stop word removal on recall values in text categorization. Neural Networks, 2003,3:20-24
    [35]于瑞萍.中文文本分类相关算法的研究与实现.西北大学硕士学位论文,西安, 2007
    [36]李荣陆.文本分类若干关键技术研究[D].复旦大学博士学位论文, 2005.
    [37]熊云波.文本信息处理的若干关键技术研究.复旦大学博士学位论文. 2006.9.
    [41]秦文,苑春法.基于决策树的汉语未登录词识别.中文信息学报, 18(1), 2003, 14-19.
    [42]孙茂松,左正平,黄昌宁.汉语自动分词词典机制的实验研究.中文信息学报, 14(1), 2000, 1-6.
    [43] Zorkadis V, Karras D A, Panayotou M. Efficient information theoretic strategies for classifier combination: feature extraction and performance evaluation in improving false positives and negatives for spam e-mail filtering [J]. Neural Networks, 2005,18:799-807
    [44] Salton G, Wang A, Yang C S. A vector space model for automatic indexing. Communication of the ACM, 1975, 18(11):613-620.
    [45]饶文碧,柯慧燕. Web文本分类计数研究及其实现.计算机技术与发展,Vol.16,No.3,2006,116-118
    [46] Fuchun Peng. Using self-supervised word segmentation in Chinese information retrieval,SIGIR’02, 2002:345-350
    [47]鲁松,晓黎,白硕,王实.文档中词语权重计算方法的改进.中文信息学报, 2000, 14(6): 8-13.
    [48]周水庚.中文文本数据库若干关键技术研究.复旦大学博士论文,上海, 2000.
    [49] Thomas Emerson, Segmenting Chinese in Unicode. 16th International Unicode Conference.2000.
    [50]吴雅倩等.基于最大熵方法的中英文基本名词短语识别.计算机研究与发展, 2003, 40(3): 440-446.
    [51] Salton G, Buckley B. Term-Weighting approaches in automatic text retrieval. Information Processing and Management, 1998,24(5):513-523.
    [52]柯慧燕. Web文本分类研究及应用[D].硕士学位论文.武汉理工大学, 2006.
    [53]王明文,付雪峰,左家莉.网页文本自动分类综述[J].南昌工程学院学报, 2005, 24(3).
    [54] Lewis DD. Na?ve (Bayes) at forty: The independence assumption in information retrieval. In Proc. of the 10th European Conf. on Machine Learning(ECML), 1998, 4-15.
    [55] Han EH, Karypis G, Kumar V. Text categorization using weight adjusted k-nearest neighbor classification. In Proc. of the 5th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, 2001, 53-65.
    [56] Joachims T. Text categorization with support vector machines: learning with many relevant features. In Proc. of the 10th European Conf. on Machine Learning, 1998, 137-142.
    [57] Ruiz ME, Srinivasan P. Hierarchical neural networks for text categorization. In Proc. of the 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1999, 281-282.
    [58] Nigam K, Lafferty J, McCallum A. Using maximum entropy for text classification. In Proc. of the Int. Joint Conf. on Artificial Intelligence IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999, 61-67.
    [59] T. Joachims: Text categorization with support vector machines: Learning with many relevant features. LS-8 Report 23, Computer Science Department, University of Dortmund, 1998.
    [60] Sebastiani F. Machine learning in automated text categorization: a survey. Tech. Rep. IEl-B4-31-1999, Istituto di Elaborazione dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999.
    [61] Yang Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1, 1999, 69-90.
    [62]刘磊.中文Web文本自动分类的研究与实现[D].硕士学位论文.长春理工大学, 2007.
    [63]谭松波.中文情感挖掘语料-ChnSentiCorp.2008.12 http://www.searchforum.org.cn/tansongbo/corpus-senti.htm
    [64]都云琪,肖诗斌.基于支持向量机的中文文本自动分类研究[J].计算机工程, 2002, 28(11):137-139.
    [65] Yang Y, Pedersen J. A comparative study on feature selection in text categorization[M]. San Francisco: Morgan Kaufmann Publishers, 1997.
    [66] Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques[M].Beijing: China Machine Press,2006.
    [67] Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques[M]. Beijing: China Machine Press, 2006.
    [68] Rumelhart DE, et al. Backpropagation: The basic theory[J]. Mathematical perspectives on neural networks,1996: 533-566
    [69] A. Selamat. Web page classification method using neural networks[J]. IEEE Tram, EIS, 2003, 123(5)
    [70]施沽斌.基于概率神经网络的文本自动分类研究[J].情报学报, 2004, 23(2): 147-151
    [71]杨建良,王永成.基于KNN与自动检索的迭代近邻法在自动分类中的应用[J].情报学报, 2004, 23(2):137-141.
    [72] M. Taboada. C. Anthony and K. Vol1. Methods for creating semantic orientation dictionaries. In proceedings of fifth international conference on language resources and evaluation. Genoa, Italy
    [73]李盼池,许少华.支持向量机在模式识别中的核函数特性分析[J].计算机工程与设计, 2005, 26(2): 302-304.
    [74]刘清.基于SVM的网络文本分类问题研究与应用.南昌大学硕士学位论文.南昌, 2007
    [75] SEO两百个秘密:停用词.2011.1. http://www.dugutianjiao.com/post/stopwords.html
    [76] Silva C, Ribeiro B. The importance of stop word removal on recall values in text categorization[J]. Neural Networks, 2003,3:20-24.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700