意见挖掘中若干关键问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网的普及和电子商务的迅速发展,互联网上存储了大量消费者对产品的评论信息,这些评论中包含消费者对产品性能或功能等方面褒义或者贬义的评价。商家/厂商可以通过跟踪这些信息,及时获取消费者的反馈意见,从而改进产品;潜在消费者可以了解其他消费者的使用体验,为合理购买产品提供帮助。然而面对Web上海量的无结构化或半结构化的评论信息,通过人工阅读的方式获取是一个费时费力的过程。因此,用户评论的意见挖掘研究应运而生,并成为近年来Web信息处理的一个研究热点。
     本文旨在研究意见挖掘中评价对象识别、评价内容分析及评价情感获取等关键问题,探索领域本体对其提供支持的方式和作用,并结合信息抽取、文本分类和自然语言处理等技术进行深入探讨。本文研究采取了方法论探索与实证分析相结合的方式,所做工作及创新点如下:
     (1)在分析已有方法和技术的基础上,借鉴软件工程中基于生命周期的模型,提出了增量迭代模型的构建方法。该方法将本体构建分成三个阶段,多步骤实施,结合本文实际应用,通过创建实例,丰富和完善了领域本体的知识结构,最终构建了一个用于产品命名实体识别中的笔记本电脑电子产品的领域本体NBO (Notebook Ontology)。
     (2)在定义并系统分析产品命名实体识别任务和方法的基础上,研究利用条件随机域CRFs(Conditional Random Fields)模型进行产品命名实体识别的方法,对识别过程中“观察窗口”大小的选取、建模粒度的选取、标注集的确定、特征的选择等关键问题通过实验验证其有效性;为了进一步提高产品命名实体识别的性能,提出了在CRFs模型中引入新的外部特征——本体特征,实验表明,融合内外部特征对产品名称实体、产品属性名称实体、产品构件名称实体的识别性能达到了理想的效果。
     (3)在研究传统基于主题的文本分类方法的基础上,利用基于机器学习的方法来进行文本的粗粒度情感分类,为解决数据稀疏问题,提出利用情感向量空间模型来进行文本表示,并通过实验对情感分类过程中的分类算法的选取,特征选择方法的运用、特征维数的选取等关键问题进行了分析和比较。为了综合考虑特征词对整个语料的贡献度和各个类别的贡献度,结合了文档频率和卡方统计的思想,提出了一个褒贬类卡方差值特征选择方法CDPNC,其分类性能F-度量值的宏平均和微平均分别达到了90.18%,90.08%。
     (4)在研究基于语义分析的情感分类方法基础上,利用依存句法分析来进行特征观点对的提取;对观点词的情感分类,针对中英文语言表达的差异,结合实际对基于逐点互信息的语义倾向方法中褒贬基准词对的选取、阈值的设定等问题进行改进,验证了其在中文评论文本情感分类上的可行性,并弥补了基于HowNet语义相似度的观点词情感分类方法的不足。
     (5)在上述研究成果的基础上,本文给出了一个意见挖掘系统的系统构架并设计实现了其原型系统。该系统可以从不同的粒度,对产品的整体评论、产品的综合特征及细节特征的评论进行全方位的意见挖掘,最终可将产品及评论的查询结果,产品意见的查询结果和产品的意见比较结果以可视化的方式呈现给用户。
With the popularization of the Internet and the rapid development of E-commerce, the Web storages huge number of customers reviews about products. These reviews contain customers positive or negative feelings about product performance, functionality, etc. Businesses or manufacturers can analysis these customer reviews, and access to consumer feedbacks in time to improve product performance and after-sales service. Potential consumers can obtain some product-using experiences from the online reviews to purchase products more reasonably. However, dealing with an enormous amount of unstructured or semi-structured reviews in a manual way would be extremely expensive and time consuming. Therefore, the research of opinion mining about customers reviews has attracted more and more attentions, and it has been becoming a hotspot in recent researches on Web information processing.
     In this dissertation, the researches aimed at some key issues of opinion mining, exploring the concrete modes and effects provided by domain ontology, and achieved this tasks combined with the information extraction, text mining and natural language processing techniques. This dissertation emphasized particularly on methodology research associated with empirical analitic study, proposed new methods based on domain ontology and obtained the following achievements:
     Firstly, based on analyzing existing methods and techniques of domain ontology construction, a incremental iterative method was proposed to construct domain ontology, and it divided the process of domain ontology construction into three phases and ten levels. Using this method enriched and consummated the knowledge framework of domain ontology through instances establishment, and a Notebook Ontology was constructed for Product Named Entity Recognition (PNER).
     Secondly, based on exploring and analyzing the tasks and methods of product named entity recognition, a Conditional Random Fields (CRFs) model was applied to PNER, and the key technologies of the identification process, such as the size selection of "observation window", the selection of modeling granularity, the determination of labeling schemes and the selection of feature were verified by experiments. In order to further improve the performance of PNER, a new external feature, namely the domain ontology feature, was introduced to the CRFs. Experimental results showed that the combination of internal and external features performed quite well and the F-measure of ETY, ATT, PART on the test set achieved the desired results.
     Thirdly, based on researching the methods of the traditional topic-based text classification, machine learning was performed to the coarse-grained sentiment classification of reviews. To solve data sparseness, the sentiment Vector Space Model (s-VSM) was used to represent text. The critical issues of the sentiment classification, i.e. the selection of classification algorithms, the determination of feature selection method and the selection of feature dimension, were verified by experiments. Furthermore, in order to consider the entire corpus contribution of features and each category contribution of features, the feature selection method of Chi-square Difference between the Positive and Negative Categories (CDPNC) was proposed. It combined DF with CHI and had the better performance. Experiments showed that the Macro-F and Micro-F achieved 90.18% and 90.08% respectively.
     Fourthly, based on introducing semantic analysis to the sentiment classification, dependency parsing was performed to extract feature-opinion. Since the differences between Chinese and English language, the semantic orientation computing based on Pointwise Mutual Information (PMI) cannot be directly applied to the sentiment classification of Chinese reviews. Considering the practical application, this dissertation improved the benchmark of positive and negative word, threshold and so on, and verified that applying the semantic orientation computing based on PMI to the sentiment classification of Chinese reviews is feasible, and can overcome the weakness of the semantic similarity computing based on HowNet.
     Finally, based on the aforementioned researches, an opinion mining prototype system was designed and implemented. It can comprehensively mining the customers reviews about the product overall and in detail. Using this system, users can get visualized results and this will be helpful for their decision making.
引文
[1]Kim SM, Hovy E. Determining the sentiment of opinions[C]. Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva,2004: 1367-1373
    [2]姚天防,程希文,徐飞玉等.文本意见挖掘综述[J].中文信息学报,2008,22(3):71-80
    [3]国家自然科学基金委员会科学基金网络信息系统[EB/OL].[2011-09-01].http://isis.nsfc.gov.cn/portal/proj_Search.asp.
    [4]Wang Suge, Wei Yingjie, Zhang Wu, Li Deyu, Li Wei. A hybrid method of feature selection for Chinese text sentiment classification [C].In Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE Computer Society.2007: 435-439
    [5]Li Shoushan, Zong Chengqing and Wang Xia. Sentiment classification through combining classifiers with multiple feature sets [C].In Proceedings of 2007 IEEE International Conference on Natural Language Proceeding and Knowledge Engineering.2007:135-140
    [6]Li Jun and Sun Maosong. Experiment study on sentiment classification of Chinese review using machine learning technique [C].In Proceedings of 2007 IEEE International Conference on Natural Language Proceeding and Knowledge Engineering.2007:393-400
    [7]邹嘉彦.评述新闻报道或文章色彩—正负两极性自动分类的研究[C].全国第八届计算语言学联合学术会议——自然语言理解与大规模内容计算.清华大学出版社.2005.21-23
    [8]第三届学生计算语言学研讨会语义分析专题[EB/OL].http://www.bnu.edu.cn/icip/documents/research/swcl2006.doc
    [9]姚天昉,聂青阳,李建超等.一个用于汉语汽车评论的意见挖掘系统[C].中文信息处理前沿进展-中国中文信息学会二十五周年学术会议论文集.北京:清华大学出版社,2006:260-281.
    [10]全国第九届计算语言学学术会议介绍[EB/OL].[2010-10-10].http:// hi.baidu.com/myqa/blog/item/85370e2e071060524ec22676.html/.
    [11]第三届全国信息检索与内容安全学术会议介绍[EB/OL].[2010-10-10].http:// scst.suda.edu.cn/ncircs2007/.
    [12]第四届全国信息检索与内容安全学术会议介绍[EB/OL].[2010-10-10].http:// www.thuir.cn/ncircs2008/.
    [13]第五届全国信息检索学术会议介绍[EB/OL]. [2010-10-10].http://ccir2009.apexlab.org/.
    [14]Yi, J., Nasukawa, T., Bunescu, R., et al..Sentiment analyzer:Extracting sentiments about a given topic using natural language processing techniques[C].The 3rd IEEE International Conference on Data Mining(ICDM-2003).Melbourne,USA:November 2003,427-434.
    [15]A. Fujii and T. Ishikawa. A system for summarizing and visualizing arguments in subjective documents:Towards supporting decision making. [C].In Proceedings of the Workshop on Sentiment and Subjectivity in Text, ACL2006,2006:15-22.
    [16]Hu M, Liu B. Mining opinion features in customer reviews[C].In Proceedings of the 19th National Conference on Artificial Intelligence:AAAI 2004. Menlo Park, California:AAAI Press,2004:755-760
    [17]Ana-Maria Popescu, Oren Etzioni. Extracting Product Features and Opinions from Reviews[C].Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. Vancouver, Canada.2005:339-346
    [18]苏棋,孙斌.面向观点挖掘的产品评价特征词识别[C].第三届全国信息检索与内容安全学术会议论文集.2007:663-667
    [19]X. Cheng. Automatic Topic Term Detection an Sentiment Classification for Opinion Mining [D] Germany:The University of Saarland,2007.
    [20]Nozomi Kobayashi, Ryu Iida, Kentaro Inui, Yuji Matsumoto. Opinion Mining as Extraction of Attribute-Value Relations[C].The 19th Annual Conference of JSAL Japan.2005
    [21]Li Zhuang, Feng Jing, Xiao-Yan Zhu.Movie review mining and summarization [C]. Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, Virginia, USA.2006.
    [22]Soo-Min Kim, Eduard Hovy. Automatic Detection of Opinion Bearing Words and Sentences[C].In Proceedings of the Second International Joint Conference on Natural Language Processing(IJCNLP-05), Jeju Island, Republic of Korea.2005.
    [23]Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto. Collecting Evaluative Expressions for Opinion Extraction[C]. IJCNLP. Hainan, China,2004
    [24]Dini L., Mazzini G. Opinion classification through information extraction[C].In Proceedings of the International Conference on Data Mining Methods and Databases for Engineering. Finance and Other Fields,2002:299-310
    [25]章剑锋,张奇,吴立德等.中文评论挖掘中的主观性关系抽取[C].第二届全国信息检索与内容安全学术会议.苏州,2007:675-681
    [26]娄德成,姚天昉.汉语语句主题语义倾向分析方法的研究.[C]中文语句主题语义倾向分析方法的研究.2007.21(5):73-79.
    [27]Bo Pang,Lillian Lee,Shivakumar Vaithyanathan.Thumbs up? Sentiment Classification using Machine Learning Techniques[C].In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing.2002:79-86.
    [28]Mullen T, Collier N. Sentiment analysis using support vector machines with diverse information sources [C].In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Morristown, NJ, USA:Association for Computational Linguistics,2004:412-418.
    [29]Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, et al.Exploring in the Weblog space by detecting informative and affective articles [C]. the 16th International World Wide Web Conference Committee (IW3C2),May 8-12,Alberta,Canada.2007:281-290
    [30]Turney Peter. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[C].In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Morristown, NJ, USA:Association for Computational Linguistics,2002:417-424
    [31]Turney P D, Littman M L. Measuring Praise and Criticism:Inference of semantic orientation from association [J].ACM Translations on Information Systems,21(4),2003: 315-346.
    [32]Hu M, Liu B. Mining and summarizing customer reviews[C].In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press,2004:168-177.
    [33]J.Kamps,M.Marx, R.J.Mokken and M.D. Rijke.Using WordNet to measure semantic orientation of adjectives[C].In Proceedings of LREC-04,4th International Conference on Language Resources and Evaluation,Lisbon,2004:1115-1118.
    [34]朱嫣岚,闵锦,周雅倩等.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20.
    [35]徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100.
    [36]路斌,万小军,杨建武等.基于同义词词林的词汇褒贬计算[C]. Proceedings of the 7th International Conference on Chinese Computing, wuhan,2007,17-23.
    [37]K. Dave, S. Lawrence, D.M. Pennock. Mining the Peanut Gallery:Opinion Extraction and Semantic Classification of Product Reviews[C].the 12th International World Wide Web Conference (WWW2003).Budapest, Hunwary:2003:519-528.
    [38]B.Liu, M.Hu, J.Cheng.Opinion observer:analyzing and comparing opinions on the Web [C].the 14th international conference on World Wide Web Conference Committee(IW3C2).Chiba,Japan,2005:342-351.
    [39]Gamon,M.,A.Aue,S.Corston-Oliver and E.Ringger.Pulse:Mining Customer Opinions from Free Text [J]. Lecture Notes in Computer Science,2005(3646):121-132.
    [40]J.Yi, W.Niblack.Sentiment Mining in WebFountain[C].In Proceedings of the 21st International Conference on Data Engineering. Tokyo, Japan,2005:1073-1083.
    [41]Theresa Wilson, Paul Hoffmann, Swapna Somasundaran et al. OpinionFinder:A system for subjectivity analysis[C].In Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005). Morristown, NJ, USA:Association for Computational Linguistics, 2005:34-35.
    [42]Introduction to Ontology[EB/OL].[2011-03-08]. http://en.wikipedia.org/wiki/Ontology/.
    [43]WordNet 2.1 for windows [CP/OL].[2011-04-02]. http://wordnet.princeton.edu/perl/.
    [44]Introduction to Cyc[EB/OL].[2011-04-08].http://www.cyc.com/.
    [45]hownet [CP/OL].[2011-04-08].http://www.keenage.com/html/c_index.html
    [46]Introduction to Mindnet[EB/OL].[2011-04-08].http://research.microsoft.com/en-us/projects /mindnet/default.aspx/.
    [47]杨靖.领域本体自动构建的关键技术研究[D].哈尔滨工业大学,2008
    [48]宋炜,张铭.语义网简明教程[M].北京:高等教育出版社,2004:108-131
    [49]冯志勇,李文杰,李晓红.本体论工程及其应用[M].北京:清华大学出版社,2007:78-80
    [50]Uschold M,Gruninger M. Ontologies:Principles, Methods and Applications [J].Knowledge Engineering Review,1996, 11(02):93-136
    [51]Thomas R. Gruber.A translation approach to portable ontology specifieations[J].Knowledge Acquisition,1993,5(2):199-200
    [52]Thomas R. Gruber. Toward Principles for the Design of Ontologies Used for Knowledge Sharing[C]. Substantial revision of paper presented at the international workshop on formal ontology, Kluwer Academic Publishers:August 23,1993:1-19
    [53]郭嘉琦.领域本体的构建及其在信息检索中的应用研究[D].北京:北京邮电大学,2007
    [54]邓志鸿,唐世渭等Ontology研究综述[J].北京大学学报(自然科学版),2002,38(5):730-738
    [55]刘红阁,郑丽萍,张少方.本体论的研究和应用现状[J].信息技术快报,2005,3(3):1-12
    [56]Lu Ruqian, Zhang Songmao. PANGU—An agent-oriented knowledge base[C].Processing of Conference on Intelligent Information Processing(16th WCC2000),2000:486-493
    [57]曹存根.大规模知识获取和分析——知识科学和计算科学[M].北京:清华大学出版社,2003:271-274
    [58]金芝,知识工程中的本体论研究——世纪之交的知识工程与知识科学[M].北京:清华大学出版社,2001:477-468
    [59]丁晟春,李岳盟,甘利人.基于顶层本体的领域本体综合构建方法研究[J].情报理论与实践,2007,30(2):236-240.
    [60]张志刚.领域本体的构建方法研究与应用[D].大连海事大学,2008
    [61]Fox, M.S. The TOVE Project:A Common-sense Model of the Enterprise[J].Industrial and Engineering Applications of Artificial Intelligence and Expert Systems,1992
    [62]M·Fernandez, A-GomezPerez, N-Juristo. Methontology:From Ontological Art Towards Ontological Engineering [J]. Symposiumon Ontological Engineering of AAAI. Stanford (California),March,1997.
    [63]Uschold M,King M.Towards a Methodology for Building Ontologies [C].In Proceedings of the Workshop on Basic Ontological Issues in Knowledge Sharing,Inter-national Joint Conference on Artificial Intelligence (IJCAI-95),Montrea Canada,1995
    [64]李景.领域本体的构建方法与应用研究[D].中国农业科学院农业信息研究所博士后研究工作站,2009.
    [65]孙倩,万建成.基于叙词表的领域本体构建方法研究[J].计算机工程与设计,2007,28(20):5054-5056.
    [66]唐静.叙词表转换为Ontology的研究[J].情报理论与实践,2004,27(60:642-645
    [67]常春Ontology在农业信息管理中的构建和转化[D].中国农业科学院科技文献信息中心,2004.
    [68]唐爱民.基于叙词表的领域本体构建研究[D].中国国防科技信息中心,2005
    [69]陈建.领域本体的创建和应用研究[D].对外经济贸易大学,2006
    [70]Web Ontology Language Overview[EB/OL].[2011-05-06] http://www.w3.org/TR/ owl-features/.
    [71]史树敏.基于领域本体的汉语共指消解及相关技术研究[D].南京理工大学,2008.
    [72]徐国虎,许芳.本体构建工具的分析与比较[J].图书情报工作,2006(1):44-48.
    [73]杜文华.本体构建方法比较研究[J].情报杂志,2005(10):24-25.
    [74]刘宇松.本体构建方法和开发工具研究[J].现代情报,2009,29(9):17-24.
    [75]高颖,曹存根,眭跃飞.音乐领域本体的建立和分析[J].计算机科学,2004(1):103-107.
    [76]Mei Wang. Research of The Constructing Methods on OWL Ontology[J]. Library and Information Service,2006,(12):30-33
    [77]Wenfeng Ma, Xiaoyong Du, Evaluation Reserch of Domain Ontology[J]. Library and Information Service,2006,(10):68-71.
    [78]Jing Ma, Qingqing Song, Sifeng liu, The Comprehensive Construction and Evolution of Domain Ontology[J].Journal of the China Society for Scientific and Technical Information,2007,26(6):827-832.
    [79]李保利,陈玉忠,俞士汶.信息抽取研究综述[J].计算机工程与应用.2003,39(10):1-5
    [80]B. Sundheim and N. Chinchor, Named Entity Task Definition[C].In Proceedings of the 6th Message Understanding Conference(MUC-6), Morgan Kaufman,1995:319-332.
    [81]Grishman R, Sundheim B, Message Understanding Conference-6:A Brief History[C].In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), August,1996:466-471
    [82]Baohua Gu, et al.Recognizing Biomedical Named Entities in the Absence of Human Annotated Corpora[C].Natural Language Processing and Knowledge Engineering (NLP-KE 2007).2007:74-81
    [83]张学清.规则与统计相结合的音乐领域命名实体识别[D].电子科技大学,2007.
    [84]Yangarher R, Grishman R, NYU:Description of the Proteus/PET System as Used for MUC-7, [C].In Proceedings of the Seventh Message Understanding Conference,1998.
    [85]William J Black,Fabio Rinaldi and David Mowatt.Facile,Description of the NE system used for MUC-7[C].Processing of the Seventh Message Understanding Conference,1998.
    [86]George Krupka and Kevin Hausman.IsoQuest Inc.:Description of the NetOwlTM Extractor System as Used for MUC-7[C].Processing of the Seventh Message Understanding Conference,1998
    [87]H Cunningham, D Maynard, K Bontcheva, etc. GATE:A Framework and Graphical Development Environment for Robust NLP Tools and Applications[C].In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia, July 2002:168-175.
    [88]Developing Language Processing Components with GATE Version4 (a User Guide) [EB/OL].[2011-06-06] http://gate.ac.uk/sale/tao/index.html/.
    [89]孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27.
    [90]罗智勇,宋柔.一种基于可信度的人名识别方法[J].中文信息学报,2005,19(3):67-72.
    [91]王宁,葛瑞芳,苑春法,黄锦辉,李文捷.中文金融新闻中公司名的识别[J].中文信息学报2002,16(2):1-6
    [92]Zhou G D,Su J.Named entity recognition USing an HMM-based chunk tagger[C].In Proceedings of 40th Annual Meeting of the ACL, Phi ladelphia,2002:473-480.
    [93]Borthwick A.A maximum entropy approach to named entity recognition[D].New York University,1999.
    [94]Lafferty J, McCallum A,Pereira F.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C].In Proceedings of 18th International Conference on Machine Learning (ICML-01). Morgan Kaufmann, Massachusetts,USA,2001:282-289.
    [95]赵健,王晓龙,关毅等.中文名实体识别:基于词触发对的条件随机域方法[J].高技术通讯,2006,16(8):795-801.
    [96]周晶,吴军华,陈佳等.基于条件随机域CRF模型的文本信息抽取[J].计算机工程与设计2008,29(23):6094-6097.
    [97]Tzong-Han Tsai, et al. Mencius:A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model [J]. International Journal of Computational Linguistics& Chinese Language Processing,2004,9(1):62-82.
    [98]McCallum A, Li w.Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons[C].In Proceedings of the 7th Conierence on Natural Language Learning, Edmonton,2003:188-191.
    [99]Chen W L, Zhang Y J, Isahara H.Chinese named entity recognition with conditional random fields[C].In Proceedings of 5th SIGHAN Workshop on Chinese Language Processing, Sydney,2006:118-121.
    [100]Lu P, Yang Y P,Gao Y B et al.Hierarchical conditional random fields(HCRF)for Chinese named entity tagging[C].The Third Internat ional Conference on Natural Computation, Haikou,2007:24-28
    [101]Conrad Chen, His-Jian Lee. A Three-Phase System for Chiese Named Entity Recognition[C]. Procedings of the Association for Computational Linguistics and Chinese Language Processing, Genea, Swizerland,2004:39-48.
    [102]刘海鹏.面向手机短信的命名实体识别研究[D].北京邮电大学,2009.
    [103]刘非凡,赵军,吕碧波等.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报.2006,20(1):7-13.
    [104]John M.Pierre.Mining Knowledge from Text Collections Using Automatically Generated Metadata[C].In Proceedings of Fourth International Conference on Practical Aspects of Knowledge Management.London,UK:Springer Verlag,2002:537-548.
    [105]Bick, Eckhard. A Named Entity Recognizer for Danish[C].In Proceedings of 4th International Conference on Language Resources and Evaluation(LREC2004), Lisbon,2006:305-308.
    [106]C.Niu,W.Li,J.H.Ding,Rohini K.Srihari.A Bootstrapping Approach to Named Entity Classification Using Successive Learners[C].In Proceedings of the 41st ACL, Sapporo,Japan,2003:335-342
    [107]张朝胜,郭剑毅,线岩团等.基于条件随机场的英文产品命名实体识别[J].计算机科学与工程.2010,32(6):115-117.
    [108]高军,王腾蛟,杨冬青等.基于Ontology的Web内容二阶段半自动化提取方法[J].计算机学报,2004,27(3):310-318.
    [109]Burr Settles.Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets[C].In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Geneva, Switzerland, 2004:104-107.
    [110]Sha F, Pereira F. Shallow parsing with conditional random fields[C]. In Proceeding of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada,2003:134-141.
    [111]Andrew Mccallum,Wei Li.Early Results for Named Entity Recognition with Conditional Random Fields Feature Induction and Web-Enhanced Lexicons[C].In Proceedings of the 7th Conference on Natural Language Learning,Edmonton,Canada,2003:188-191.
    [112]Andrew Mccallum, Fang-fang Feng.Chinese Word Segmentation with Conditional Random Fields and Integrated Domain Knowledge[s.n.].2003:1-8.
    [113]Fuchun Peng,Fangfang Feng,Andrew McCallum.Chinese Segmentation and New Word Detection using Conditional Random Fields[C].In Proceedings of the 20th International Conference on Computational Linguistics,Switzerland,Geneva,2004:562-568.
    [114]Guangjing Jin, Xiao Chen. The Fourth International Chinese Language Processing Bakeoff: Chinese Word Segmentation, Named Entity Recognition and Chinese POS Tagging[C].In Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing. 2008,69-81.
    [115]Ryan McDonald, Fernando Pereira.Identifying Gene and Protein Mentions in Text Using Conditional Random Fields [J]. BMC Bioinformatics, May,2005:1-7.
    [116]Hanna M. Wallach, Conditional Random Fields:An Introduction[R].University of Pennsylvania CIS TechnicalReport MS-CIS-04-21,2004.
    [117]Hanna M. Wallach. Efficient Training of ConditionalRandom Fields[D]. Science School of Cognitive Science Division of Informatics University of Edinburgh,2002.
    [118]Malouf.R.A comparison of algorithms for maximum entropy parameter estimation[C].In Proceedings of the 6th Workshop on Computational Language Learning,Morristown(USA): ACL,2002:1-7.
    [119]J.R.Crran and S.Clark Investigatigating GIS and Smoothing for Maximum Entropy Taggers[C].In Proceedings of the 1st Conference of the Europen Chapter of the Association of Computation Lingistics (EACL),Budpaest,Hungary,2003:91-98.
    [120]Nocedal J, Wright S J. Numerical Optimization [M]. New York:Springer-Verlag,1999.
    [121]Liu D, Nocedal J.On the limited memory BFGS method for large scale optimization[J]. Mathematical Programming,1998(45):503-528.
    [122]Della Pietra S,Della Pietra V, Lafferty J. Inducing features of random fields [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(4):380-393.
    [123]Byrd R H,Nocedal J,Schnabel R B.Representations of quasi-Newton matrices and their use in limited memory methods[J].Math Program Ser A.1994,63(2):129-156.
    [124]Hai Zhao, Chunyu Kit, Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition[C].In Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing(SIGHAN-6).Hyderabad, India.2008:106-111.
    [125]Fujii Yasuhisa. CRF++ Package[CP/OL].[2011-05-06]. http://crfpp.sourceforge.net/.
    [126]Wenliang Chen, Yujie Zhang, and Hitoshi Isahara, Chinese Named Entity Recogniton with Conditional Random Fields[C].In.Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing (SIGHAN-5).2006:118-121.
    [127]Roman Klinger, et al. Named Entity Recognition with Combinations of Conditional Random Fields [C]. In Proceedings of the Second BioCreative Challenge Evaluation Workshop. Madrid, Spain,2007:89-92.
    [128]Richard Tzong-Han Tsai, A Hybrid Approach to Biomedical Named Entity Recognition and Semantic Role Labeling, HLT-NAACL'06,2006.
    [129]KW, HarksP. Word association norms,mutual information and lexicography[J]. Computational Linguistics,1990(3):22-29.
    [130]C. M. Friedrich, T. Revillion, M. Hofmann, and J. Fluck., Biomedical and chemical named entity recognition with conditional random fields:The advantage of dictionary features[C].In Proceedings of the Second International Symposium on Semantic Mining in Biomedicine (SMBM 2006).2006:85-89.
    [131]张海雷,曹菲菲,陈文亮等.基于多层次特征集成的中文实体指代识别[J].中文信息学报,2007,21(5):126-130.
    [132]ictclas software[CP/OL].[2011-05-06].http://ictclas.org/index.html/.
    [133]ChnSentiCorp[DB/OL].[2011-05-06]. http://www.searchforum.org.cn/tansongbo/corpus-senti.htm/.
    [134]Introduction to Transformation-Based Learning [EB/OL].[2011-07-01].http://www.cs. jhu.edu/-rflorian/fntbl/tbl-toolkit/node3.html.
    [135]E. Brill.Transformation-based error-driven learning and natural language processing:A case study in part of speech tagging [J]. Computational Linguistics,1995,21(4):543-565.
    [136]陈翀.网络资源的名字特征及其在资源组织中的应用研究.[D]北京大学,2008.
    [137]王晶,郑德权,赵铁军等.基于TBL的日文名实体识别后处理技术[J].计算机科学,2008,35(4):333-334.
    [138]FnTBL [CP/OL]. [2011-07-05].http://www.cs.jhu.edu/-rflorian/fntbl/index.html/.
    [139]Jena Documentation[EB/OL]. [2011-07-10].http://jena.sourceforge.net/documentation.html
    [140]Introduction to OntModel[EB/OL].[2011-07-10]. http://jena.sourceforge.net/javadoc/com/ hp/hpl/jena/ontology/OntModel.html/.
    [141]Kumar R, Novak J, Raghavan P, et al. On the bursty evolution of blogspace[C].In Proceedings of the 12th Int. Conf. on World Wide Web,2003:568-576.
    [142]Mishne G. Using blog properties to improve retrieval[C].In Proceedings of the 1st Int. Conf. on Weblogs and Social Media,2007.
    [143]Zhang W, Yu C, Meng W. Opinion retrieval from blogs[C].In Proceedings of the 16th ACM Conf. on Information and Knowledge Management,2007:831-840.
    [144]陈博.WEB文本情感分类中关键问题的研究[D].北京邮电大学,2008.
    [145]Bar-llan J. An outsider's view on "topic-oriented" blogging [C]. Proceedings of the 13th Int. World Wide Web Conf. on Alternate Track,2004:28-34.
    [146]Mei Q, Liu C, Su H, et al. A Probabilistic approach to spatiotemporal theme pattern mining on Weblogs. [C]. Proceedings of the 15th Int. Conf. on World Wide Web,2006:533-542.
    [147]Durant KT, Smith MD. Mining sentiment classification from political web logs[C]. Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining,2006.
    [148]Glance N, Hurst M, Tornkiyo T. Blogpulse:automated trend discovery for Weblogs[C]. Proceedings of WWW 2004 Workshop on the Weblogging Ecosystem:Aggregation, Analysis and Dynamics,2004.
    [149]宗成庆.统计自然语言处理[M].北京:清华大学出版社,2008,340-353.
    [150]周立柱,贺宇凯,王建勇.情感分析研究综述[J].计算机应用,2008,28(11):2725-2728.
    [151]Hatzivassiloglou,V.,Wiebe,J.M. Effects of adjective orientation and gradability on sentence subjectivity [C].In Proceedings of the 18th Conference on Computational Linguistics. Morristown, NUSA:Association for Computational Linguistics,2000:299-305.
    [152]Hong Yu,Vasileios Hatzivassiloglou. Towards answering opinion questions:Separating facts from opinions and identifying the polarity of opinion sentences[C].In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. Morristown, NJ,USA:Association forComputationalLinguistics,2003:129-136.
    [153]Bo Pang, Lillian Lee. A sentimental education:Sentiment analysis using subjectivity summarization based on minimum cuts[C].In Proceedingsof the 42nd Annual Meeting of the Association for Computational Linguistics. Morristown, NJ, USA:Association for ComputationalLinguistics,2004:271-278.
    [154]叶强,张紫琼,罗振雄.面向互联网评论情感分析的中文主观性自动判别方法研究[J].信息系统学报,2007,1(1):79-91.
    [155]王晓东,刘倩,陶县俊.情感Ontology构建与文本倾向性分析[J].计算机工程与应用,2010,46(30):117-120.
    [156]李纲,王忠义,寇广增.情感分类中请歌词的情感倾向度的计算方法研究[J].情报学报,2011,28(3):292-298.
    [157]曹斌.互联网上旅游评论的情感分析及其有用性研究[D].哈尔滨工业大学,2008.
    [158]寇广增.基于意见挖掘通用框架的情感极性强度模糊性研究[D].武汉大学,2010.
    [159]李纲,程洋洋,寇广增.句子情感分析及其关键问题[J].图书书情报工作,2010,54(11):104-107
    [160]Theresa Wilson, Janyce Wiebe. Annotating opinions in the world press[C].In Proceedings of the 4th ACL SIGdial Workshop on Discourse and Dialogue (SIGdial-03),2003:13-22.
    [161]程显毅,朱倩.文本挖掘原理[M].北京:科学出版社,2010:9-37.
    [162]胡燕,吴虎子,钟珞.中文文本分类中基于词性的特征提取方法研究[J].武汉理工大学学报,2007,29(4):132-135.
    [163]Introduction to TF-IDF[EB/OL].[2011-7-10].http://baike.baidu.com/view/1228847.htm.
    [164]陈克利.大规模平衡语料的收集分析及文本分类方法研究[D].中国科学院,2004.
    [165]李荣陆.文本分类及相关技术研究[D].复旦大学,2005.
    [166]雷琼.中文文本分类和聚类中的特征值选择研究[D].中山大学,2006.
    [167]尚文倩.文本分类及其相关技术研究[D].北京交通大学,2007.
    [168]朱颖东,钟勇.一种新的基于多启发的特征选择方法[J].计算机应用,2009,29(3):849-851.
    [169]代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报,2004,18(1):26-32.
    [170]彭其伟.基于统计方法的中文文本情感倾向分类研究[D].山西大学,2007.
    [171]邓乃扬,田英杰.数据挖掘中的新方法——支持向量机[M].北京:科学出版社,2004.
    [172]闻彬,何婷婷,罗乐等.基于语义理解的文本情感分类方法研究[J].计算机科学,2010,37(6):261-264.
    [173]Penn TreeBank[EB/OL].[2011-07-15].http://www.cis.upenn.edu/-treebank/.
    [174]Sinica TreeBank[EB/OL]. [2011-07-15].http://turing.iis.sinica.edu.tw/treesearch/.
    [175]Prague Dependency TreeBank [EB/OL]. [2011-07-15]. http://ufal.mff.cuni.cz/pdt/ index.html/.
    [176]Quranic Arabic Corpus [EB/OL]. [2011-07-15].http://corpus.quran.com/.
    [177]HIT-IR Chinese Dependence Treebank [EB/OL]. [2011-07-15].http://ir.hit.edu.cn/.
    [178]Lun-Wei Ku, Yu-Ting Liang, Hsin-Hsi Chen. Opinion Extraction, Summarization and Tracking in News and Blog Corpora. [C].In Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs,2006:100-107.
    [179]刘群,李素建.基于《知网》的词汇语义相似度的计算[C].第三届汉语词汇语义学研讨会,台北,2002.
    [180]HIT-SCIR:LTP[CP/OL]. [2011-07-20].http://ir.hit.edu.cn/ltp/.
    [181]Wanxiang Che, Zhenghua Li, Ting Liu. LTP:A Chinese Language Technology Platform[C]. Proceedings of the Coling 2010:Demonstrations. Beijing, China 2010:13-16.
    [182]Local Search Web Services. [CP/OL].[2011-07-22].http://developer.yahoo.com/search/ local/V3/localSearch.html/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700