详细信息
Sentiment analysis, as a hot research topic in the research area of informationextraction, has attracted more and more attention from the beginning of this century. Withthe rapid development of the Internet, especially the rising popularity of Web2.0technology, the network user has become not only the content maker but also the receiverof information. Meanwhile, benefiting from the development and maturity of thetechnology in natural language processing and machine learning, it becomes possible towidely employ sentiment analysis on subjective texts.
     Existing studies on sentiment analysis are mainly focusing on the task of determiningdocument-level and sentence-level sentimental polarity. This task has reflected its valuableapplications in some real applications, such as analysis of internet public opinions andstocks review. However, with the deepening of its application, the user puts forward somehigher requests. For instance, some users hope to get analysis results of target attribute’ssentiment. In this situation, the traditional technologies in sentiment analysis are notcapable of meeting such novel demands. Therefore, in this study, we propose the methodon fine-grained sentiment analysis to meet the new challenges by exploring new researchideas and methods to further improve the accuracy and practicability of sentiment analysis.
     This paper focuses on the key technology in the analysis of fine-grained sentiment andthe content includes:
     1) The quantification on the polarity strength of a sentiment word. The research ofdeciding sentiment words’ polarity has relatively matured. However, in order to realize thetask of fine-grained sentiment analysis, we need to calculate quantified sentiment strengthto meet the need of sentiment counting. Based on the existing algorithm on quantifyingsentiment strength, we present an improved strategy: First, we classify the sentiment words into different categories; Second, for each category, we design different calculation rulesand methods to quantify the sentiment strength. The main advantegy of the proposedapproach lies in its making full use of the relationship between characters and words, aswell as the linguistics knowledge.
     2) The joint model for recognization of the target attribute and its sentiment expression.In the task of fine-grained sentiment analysis, it is important to correctly recognize thetarget attribute and the sentiment expression in the text. Combining with the theory ofConditional Random Fields, we make effective use of the class-relation between the targetattribute and its sentiment expression and introduce a joint recognition model based on thesequence structure. Additionally, this paper analyzes the related knowledge of the basicand semantic features and the extraction methods. Specifically, this paper analyzes theextraction of semantic features and designs a novel algorithm.
     3) The classification method on attributes based on semi-supervised leaning andsentiment calculation. This paper proposes a semi-supervised learning method into theresearch of attribute classification to reduce the dependence of tagged corpus so as toovercome the difficulties on annotation work on fine-grained sentiment tagged corpus.First, this paper studies the initial seed selection strategy based on stratified sampling andcompares it with those not in the performance of the experiment. Second, this paperemploys the application of this selection strategy to each step of iteration at bootstrappingprocess and discusses the termination condition of bootstrapping iteration. Third, for thereview which might take sentiment word lacking of target attribute, the PMI is adopted todetermine the association probability with target attributes and sentiment words. In thisway, the proposed approach is able to realize the reasonable classification of sentimentword lacking of target attribute, and make the sentiment summing more reasonable andeffective.
     The main contribution of this paper is summarized as follows: First, in theory, itcarries on a detailed analysis and research of the characteristic fuzziness on sentimentstrength. For the task of sentiment analysis, it makes full use of the relationship betweencharacters and words, as well as the linguistics knowledge, and optimizes the method to quantify sentiment strength, which achieves some certain improvements in performance.Second, in the research of jointly recognizing the target attributes and the correspondingsentiment expressions, this paper proposes a joint recognition model based on the sequencestructure, making full use of the basic features and semantic features in the review.Moreover, by adjusting the template of CRF classifier, this paper further analyzes theeffect of feature combination and context information on recognition performance whichincrease the efficiency in ascension. In addition, this paper also includes the relatedresearch of attribute classification and sentiment calculation, verifies the validity ofsemi-supervised leaning method in attribute classification. Third, by designing reasonablesentiment calculation method, it completes sentiment summing based on attributeclassification and realizes the fine-grained sentiment statistic. Finally, this paper designs afine-grained sentiment analysis application system on hotel review and builds the systemwith encapsulated internal core function with a friendly user-interface.
