用户名: 密码: 验证码:
网络健康社区中的文本挖掘方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人们对自身健康关注程度的日益提高,对医疗保健的观念也正逐渐发生改变,主张从过去被动的疾病治疗到积极的健康自我管理。病人要想积极的参与到自身疾病的诊疗决策以及日常的健康自我管理,没有一个良好的信息交流平台是无法实现的。近年来网络健康社区的快速发展为人们交流医疗健康信息提供了可能,大量用户参与到网络健康社区中寻求和分享个人健康保健和疾病诊疗经验、对各健康话题提出自己的观点,同时网络健康社区也为病人及其家属进行情感交流与寻求情感支持创造了良好的沟通平台。深入地了解和分析网络健康社区是一个非常有意义的研究课题,一方面可以帮助社区网站优化人机交互界面,提供更个性化的工具和功能来便于社区成员更好地参与到社区讨论中,提高其参与的积极性;另一方面对参与网络健康社区的广大用户来说,对网络健康社区的深入研究可以帮助他们更快的了解这一新兴的在线交流形式,帮助他们快速的发现其感兴趣的话题或者寻找他们希望与之交流的社区成员,使他们更好的融入到网络健康社区平台中。
     正因为网络健康社区在人们日常生活中发挥着越来越重要的作用,因此也成为众多研究者关注的热点。许多研究已经从不同角度和侧面展开,例如分析不同人群在社区中的参与特点,探索社区中健康相关的热点主题以及分析成员在社区中的情感表达与交流等几个方面。但目前大部分研究采用的是基于调查问卷的方法,或者依靠人工标注的内容分析方法,随着网络健康社区的快速发展,当面对日益增长的海量的社区文本时,这些传统的人工分析方法不但效率低下,而且缺乏科学性和客观性,无法对网络健康社区进行更有效的分析。因此本文探索利用机器学习和文本挖掘等智能化处理方法,对目前网络健康社区研究中的主要热点问题进行全面系统的分析,主要研究内容包括以下三个方面:健康热点主题的识别、社区成员角色的识别以及社区成员的情感表达分析等。
     (1)健康热点主题识别。社区成员在网络健康社区中可以对感兴趣的话题进行自由的交流,但研究发现由于社区信息组织的无序导致用户难以快速的查找到所需要的信息,社区网站和相关研究者也很难从中发现用户对各类健康主题的兴趣和需求,由此我们提出了健康热点主题的自动识别方法。通过借助于UMLS等外部医疗知识源,我们从社区论坛的发帖文本中抽取了n-gram特征、领域相关特征以及情感特征等能有效表示健康主题的特征集合,然后通过文本聚类技术对社区发帖文本按其主题划分为不同的簇,每一个结果簇代表一类健康热点主题,最后通过对簇中关键词抽取的方式实现对健康热点主题的有效识别。在实验的测试和评估中,以国际知名网络健康社区Medhelp为实验数据来源,选择其中的肺癌、乳腺癌和糖尿病三种典型疾病为研究对象搜集样本数据,在对模型的各项参数进行确定后,我们分别得到了7个结果簇,通过关键词抽取,我们最终定义了7个健康热点主题,分别是个人详细介绍、情感支持、症状、检查、并发症、用药和治疗,随后验证了方法的有效性。最后通过进一步讨论,我们发现在不同类型疾病论坛中健康热点主题的分布也有明显差异,比如肺癌论坛中的症状、乳腺癌论坛中的检查、糖尿病论坛中的用药等主题的分布都明显高于其他主题。
     (2)社区成员角色识别。网络健康社区中有不同类型的参与人群,他们有着不同的参与目的和需求,表现出不同的角色特征。对不同角色成员的有效识别便于网站提供差异化服务来满足不同成员的需求,也便于社区成员之间增强相互理解和信任。然而出于隐私保护等原因使得个人有效信息相对缺乏,这给有效识别社区成员的角色造成了很大困难,为此我们引入了基于文体学的文本作者角色识别理论,提出了网络健康社区中的成员角色识别方法,通过对社区成员发帖文本的文体写作特征来判断社区成员的不同角色类型,提取的文体特征包括词汇特征、句法特征和结构特征,并结合内容相关的特征构成特征集,然后采用文本聚类方法将所有发帖按其不同的文体写作特征进行划分,最终实现对社区成员角色的有效识别。在实验中我们仍然以Medhelp健康社区三种疾病论坛中的发帖文本作为实验语料,对病人、护理者和医疗专家等三类主要角色进行了有效识别,并通过进一步讨论对三类人群的差异化进行了分析。
     (3)社区成员的情感表达分析。网络健康社区作为一个用户广泛参与交流的平台,社区成员的发帖都包含了他们丰富的情感表达,因此我们探索建立了一套面向网络健康社区的情感分析方法来有效地识别这些包含情感表达的发帖文本,并分析其中的情感倾向性。首先,我们介绍了对网络文本的情感分析技术,并分别根据基于机器学习的情感分析方法和基于情感词典的情感分析方法来设计研究框架,对健康社区中的发帖文本进行主客观分类和情感极性分类。在基于机器学习的方法中通过选取领域特征、词性特征、文体特征等有效区分文本情感的特征构成特征集,并采用SVM分类算法实现对文本的情感分类。在基于情感词典的分析方法中,探索如何引入外部情感知识源来抽取文本中的情感词并计算其情感表达强度,设定合理的情感极性区分标准来判别文本的情感倾向性。通过实验测试,发现两种情感分析方法各有优劣,最终我们将两种方法进行有效的融合,并通过科学的度量验证了融合两种方法的情感分析综合模型的有效性。在进一步深入的分析讨论中,我们针对不同的疾病类型,不同的健康主题和不同类型的成员角色等从多个角度对社区成员的情感表达特点进行了分析和总结。
     本文的创新工作主要体现在以下三个方面:
     (1)提出了基于文本聚类的网络健康社区热点主题识别方法。由于目前对网络健康社区热点主题的研究多采用基于人工统计标注的方法,处理效率低下且缺乏科学性,本文提出将文本聚类的方法运用到网络社区的健康主题识别中,在传统的文本表示方法的基础上,本文提出引入外部医疗健康知识源从文本中抽取出区分健康主题的领域相关特征,并进一步提出加入情感特征来提高聚类效果,最后通过实验验证我们构建的健康主题识别模型的有效性。
     (2)提出了基于文体学的网络健康社区成员角色识别方法。有效地识别不同的社区成员角色是分析研究网络健康社区的主要基础和前提,而由于个人资料的缺乏和隐私保护的需要使得我们难以识别成员的角色,因此相关角度的研究也非常少。本文创新性的提出了基于文体学的角色识别方法,从网络文本中提取词汇特征、语法特征和结构特征等能区分不同类型社区成员的发帖风格特征来判断成员的角色,并通过实验验证了该方法的有效性。
     (3)提出了适用于网络健康社区中情感分析综合模型。由于基于机器学习的情感分析方法和基于情感词典的情感分析方法在应用于网络健康社区的文本情感分析中各有利弊,我们提出了将两者相结合的情感分析综合模型,首先在文本的主客观分类上,我们采用机器学习的方法,选取领域特征、词性特征、文体特征等特征集,实现对文本的主客观分类。然后在对主观性文本的情感极性分析中,我们采用基于情感词典的方法,利用外部情感词典SentiWordNet来抽取文本中的情感极性词,实现对文本整体的极性判断。实验结果也表明我们提出的模型在网络健康社区的情感分析中具有很好的效果。
In recent years, people pay more and more attention to their health. Their perceptions onhealthcare have gradually changed from passive disease treatment to positive healthself-management. However, it is difficult for people to actively participate in treatment decisionsas well as day-to-day health self-management without good information exchange platform.Online health community has grown rapidly recently where patients and their caregiverscommunicate their interesting information, share their experiences, and offer emotional supportand encouragement. A thorough understanding of online health community is a very significantissue. Our study could assist the websites in optimizing the human-computer interface, providingpersonalized tools and functions to facilitate patient engagement and improving the ease of useand social interaction. More importantly, our study is of great help to the end users of onlinehealth communities themselves, which could enable them to obtain a sense of what online healthcommunities are, quickly find the issues they concerned about, and become involved in onlinehealth communities more easily.
     Online health community has become a hot research issues as it plays an increasinglyimportant role in people's daily life. Many studies have been done from different perspectives,including exploring health-related hot topics, analyzing the characteristic of community membersand their emotional expression. But most of the research adopted the methods based onquestionnaire or content analysis. When faced with the growing number of community membersand their posts, these traditional manual methods has become impossible to process huge amountof data. Therefore, we planed to use automatic methods such as machine learning and text miningto study the hot issues in online health community, including: health-realted hot topicidentification, community members' role identification and sentiment analysis of the communitymembers.
     (1) Health-related hot topic identification. Health community members discuss theirinteresting health-related topics in online health community. Howerver, unordered text structuremakes it difficult for the users to retrieve valuable information and is also hard for web designersand researchers to find community members' needs. Thus, we proposed an automatic identificationframework for health-related hot topics. With the help of UMLS medical knowledge source, weextracted n-gram features, domain-specific features and sentiment features which could effectivelyrepresent health-related topics. And then using text clustering technology, we divided all the textdata into different clusters and each cluster represents a health-related hot topic. And finally alltopics could identified based on the extracted keywords. Then we made an experiment to evaluateour mothod. We chose the well-known online health community Medhelp as data resource andcollected sample data from three disease discussion boards, they were lung cancer, breast cancerand diabetes. After determining the values of the model parameters, we got7clusters from threekinds of diseases forum that represented7health-related hot topics, including personalself-introduction, emotional expression, symptom, examination, complication, medication andtreatment. Further analysis of the results showed that the distributions of health-related hot topicsin different types of diseases are different significantly, such as the discussions of symptom in lung cancer forum, examination in breast cancer forum and medication in diabetes forum aresignificantly more than that of other topics.
     (2) Participants’ role identification. There are different types of participants involved inonline health community and they have different demands and behavior characteristics. Theidentification of different types of community members helps the websites provide personalizedservices to meet the needs of different users, and meanwhile facilitate community members toenhance mutual understanding and trust. However the lack of personal information caused byprivacy protection makes it difficult to identify, community members’ role. So we introduced thetheory based on the stylistics text of role identification constructing the participant roleidentification method of online health communities. Through the community members’ post textwriting characteristics to determine the role of different types, extract stylistic features includinglexical features, syntactic features and structural characteristics and combined with content-relatedfeatures to generate feature sets. Then we will use the text clustering algorithm to classify all postsaccording to the different style of writing characteristic and ultimately realize the role ofcommunity members’ effective identification. Finally we chose the same sample data as used inthe experiment of hot identification and made an experiment to identify three main roles in onlinehealth communities: patients, caregivers and medical experts and futher discuss the difference ofthe three main groups of members.
     (3) Sentiment analysis of community members. The community members expressed theiremotions through posting in online health community. We proposed to use sentiment analysis toidentify the subjective posts including emotional expressions of community members, and analyzetheir polarity. Firstly, we proposed the method of sentiment analysis based on machine learning Bychoosing feature set such as domain-specific features, POS features and stylistic features, weclassified all the forum posts into objective posts and subjective posts and further classfied thesubject posts into positive posts and negative posts. Meanwhile, we proposed another method ofsentiment analsis based on sentiment dictionary to identify the sentiment expressions of forumposts through extracting the sentiment words from forum posts and summing the sentiment values.Through the experimental test, we found that two kinds of methods have their advantages anddisadvantages, so we finally proposed a comprehensive model of sentiment analysis by combiningthe two methods. In the last discussion, from multiple perspectives such as different disease types,different health topics and different types of member roles, we analyzes and summarizes theemotional expression characteristics of community members.
     The contributions of this paper are listed as follows:
     (1) We proposed the method of health-related hot topic identification using text clustering.The current research on health-related hot topics were based on manual statistics, resulting in lowefficiency and lack of science. So in this paper, text clustering method was introduced intohealth-related topic identification. Based on the traditional text representation, we proposed to adddomain-specific features and sentiment features into the text representation to improve the resultsof topic identification. Both features were proved effective in distinguishing differenthealth-related topics through the following experiment.
     (2) We proposed the method of participant role identification based on stylistics. A betterunderstanding of different roles of community members was very significant to study onlinehealth community. However, the lack of personal profiles and privacy protection made it difficultto identify members' roles, thus few studies has been done in this fields. In this paper weinnovatively proposed the method of role identification based on stylistics, extracting lexicalfeatures, syntactic features and structural features that effectively distinguished writing style ofdifferent types of participants to identify the members' roles.
     (3) We proposed a comprehensive sentiment analysis model applied to online healthcommunity by combining the two sentiment analysis methods based on machine learning andsentiment dictionary. In the classification of subjective posts and objective posts, we used themethod based on machine learning. By chosing domain-specific features, POS feature, stylisticfeatures to construct feature set and distinguished subjective posts from objective posts. Insubsequent analysis of sentiment polarity of subject posts, we used the method based on sentimentdictionary and extracted sentiment words to judge the polarity of subject posts. The methods wereproved effective through the following experiments.
引文
[1]丁媛.中国病人参与治疗决策影响因素论述研究[J].东方企业文化,2011,(11):166.
    [2]周慧,葛荣霞,冯顺利等.医学图书馆数字信息资源用户分析[J].情报科学,2009(7):1053-1057.
    [3]徐敏娜.我国公共医疗健康信息公益性增值利用研究[J].情报资料工作,2011(1):16-19.
    [4]沈光宝. Internet上药学信息资源的开发利用及评价[J].情报科学,2002(9):961-964.
    [5]贺小光,兰讽.网络社区研究综述[J].情报科学,2011,29(8):1268-1272.
    [6] Fox S, Jones S. The social life of health information.[EB/OL]http://www.pewinternet.org/Reports/2009/8-The-Social-Lifeof-Health-Information.aspx,2009-6-11.
    [7] Umefjord G, Petersson G, Hamberg K. Reasons for consulting a doctor on the internet: websurvey of users of an Ask the Doctor Service[J]. Journal of Medical Internet Research,2003,5(4):26.
    [8] Gerber B S, Eiser A R. The patient-physician relationship in the internet age: future prospectsand the research agenda [J]. Journal of Medical Internet Research,2001,3(2): e15.
    [9] Bansil P, et al. Health-related information on the web: results from the HealthStyles survey,2002-2003[J].Preventing Chronic Disease: Public Health Research, Practice, and Policy,2006,3(2):1-10.
    [10] Anderson J G, Rainey M R., Eysenbach G. The impact of cyberhealthcare on thephysician-patient relationship [J]. Journal of Medical Systems,2003,27(1):67-84.
    [11] Finn J. An exploration of helping processes in an online self-help group focusing on issues ofdisability [J]. Health and Social Work,1999,24(3):220–231.
    [12] Culver J D, Gerr F, Frumkin H. Medical information on the Internet: a study of an electronicbulletin board [J]. Journal of General Internal Medicine,1997,12(8):466–470.
    [13] Varlamis I, Apostolakis I. Medical Informatics in the Web2.0Era. New Direct [J]. Studies inComputational Intelligence,2008,142:513–522.
    [14] Kernisan L P, Sudore R L, Knight S J. Information-seeking at a caregiving website: Aqualitative analysis [J]. Journal of Medical Internet Research,2010,12(3):31.
    [15] Ginossar T. Online participation: a content analysis of differences in utilization of two onlinecancer communities by men and women, patients and family members [J]. HealthCommunication,2008,23(1):1–12.
    [16] Colineau N, Paris C. Talking about your health to strangers: understanding the use of onlinesocial networks by patients [J]. New Review of Hypermedia and Multimedia,2010,16(1-2):141-160.
    [17] Hughes B,Joshi I, Lemonde H, Wareham J. Junior physician's use of Web2.0for informationseeking and medical education: a qualitative study. International Journal of MedicalInformatics,2009,78(10):645–655.
    [18] Domingo M C. Managing Healthcare through Social Networks [J]. Computing,2010,43(7):20-25.
    [19] Durant K T. Identifying Temporal Changes and Topics that Promote Growth within OnlineCommunities: A Prospective Study of Six Online Cancer Forums [J]. InternationalJournal ofMathematical Modelling and Algorithms,2011,2(2):1–22.
    [20] Cho J, Noh H I, Ha M H, Kang S N, Choi J Y, Chang Y J.What kind of cancer informationdo Internet users need?[J]. Support Care Cancer,2011,19(9):1465-1469.
    [21] Richter J G, Becker A, Schalis H, et al. An ask-the-expert service on a rheumatology web site:who were the users and what did they look for?[J]. Arthritis Care and Research,2011,63(4):604-611.
    [22] Liang B, Scammon D L. E-Word-of-Mouth on health social networking sites: An opportunityfor tailored health communication [J]. Journal of Consumer Behaviour,2011,10(6):322–331.
    [23] Wright K B. Social support within an on-line cancer community: An assessment of emotionalsupport, perceptions of advantages and disadvantages, and motives for using the community[J]. Journal of Applied Communication Research,2002,30:195–209.
    [24] Blank T O, Schmidt S D, Vangsness S A, Monteiro A K, and Santagata P V. Differencesamong breast and prostate cancer online support groups [J]. Computers in Human Behavior,2010,26(6):1400–1404.
    [25] Blank T O, Adams-Blodnieks M. the Who and the What of Usage of Two Cancer OnlineCommunities [J], Computers in Human Behavior,2007,23(3):1249-1257.
    [26] Gooden R J, Winefield H R. Breast and prostrate cancer online discussion boards. A thematicanalysis of gender differences and similarities [J]. Journal of health psychology,2007,12(1):103–114.
    [27] Swan M. Emerging patient-driven health care models: an examination of health socialnetworks, consumer personalized medicine and quantified self-tracking [J]. InternationalJournal of Environmental Research and Public Health,2009,6(2):492–525.
    [28] Macias W, Lewis L S, Smith T L. Health-related message boards/chat rooms on the Web:discussion content and implications for pharmaceutical sponsorships [J]. Journal of HealthCommunication,2005,10(3):209–223.
    [29] Schultz P N, Stava C, Beck M L, Vassilopoulou-sellin R. Internet message board use bypatients with cancer and their families [J]. Clinical Journal of Oncology Nursing,2003,7(6):663–667.
    [30] Castleton K, Fong T, Wang-Gillam A, Waqar M A, Jeffe D B, Kehlenbrink L, Gao F,Govindan R. A survey of Internet utilization among patients with cancer [J]. Support CareCancer,2011,19(8):1183-1190.
    [31] Rodgers S, Chen Q. Internet community group participation: Psychosocial benefits forwomen with breast cancer [J]. Journal of Computer Mediated Communication,2005,10(4):no pagination specified.
    [32] Klemm P, Wheeler E. Cancer caregivers online: Hope, emotional roller coaster, andphysical/emotional/psychological responses. Computer Information Nursing,2005,23(1):38–45.
    [33] Qiu B, Zhao K, Mitra P, et al. Get online support, feel better: Sentiment analysis anddynamics in an online cancer survivor community [C]. In Proceedings of the3rd IEEEInternational Conference on Social Computing,2011,274-281.
    [34] Buis L R. Comparison of Social Support Content within Online Communities for High-andLow-Survival-Rate Cancers [J]. Computers Informatics Nursing,2011,29(8):461–467.
    [35] Armstrong N,Powell J. Patient perspectives on health advice posted on Internet discussionboards: a qualitative study [J]. Health Expect,2009,12(3):313–320.
    [36] Attard A, Coulson N S. A thematic analysis of patient communication in Parkinson’s diseaseonline support group discussion forums [J]. Computers in Human Behavior,2012,28(2):500-506.
    [37] Bekhuis T, Kreinacke M, Spallek H, Song M, O'Donnell J A. Using natural languageprocessing to enable in-depth analysis of clinical messages posted to an Internet mailing list:a feasibility study [J]. Journal of Medical Internet Research,2011,13(4):98.
    [38] Zhou L, Srinivasan P. Concept space comparisons: explorations with five health domains. InProceedings of AMIA Annual Symposium,2005,874-878.
    [39] Chen A T, Exploring online support spaces: Using cluster analysis to examine breast cancer,diabetes and fibromyalgia support groups [J]. Patient Education and Counseling,2012,87(2):250–257.
    [40] Chee B, Berlin R, Schatz B. Measuring population health using personal health messages.InProceedings of AMIA Annual Symposium,2009,92-96.
    [41] Vydiswaran V G V, Zhai C X. Gauging the internet doctor: ranking medical claims based oncommunity knowledge [C]. In Proceedings of the2011workshop on Data mining formedicine and healthcare,2011,42-51.
    [42] Yetisgen-Yildiz M, Pratt W. The effect of feature representation on MEDLINE documentclassification [C]. In Proceedingsof AMIA Annual Symposium,2005,849–853.
    [43] Kandula S, Curtis D, Hill B, Zeng-Treitler Q.Use of topic modeling for recommendingrelevant education material to diabetic patients [C]. In Proceedings of AMIA AnnualSymposium,2011,674–682.
    [44] Griffiths T L, Steyvers M. Finding scientific topics [C].In Proceedings of the NationalAcademy of Sciences of the United States of America,2004,5228-5235.
    [45] He X, Ding C H Q, et al. Automatic topic identification using webpage clustering [C]. InProceedings of IEEE International Conference on Data Mining,2001,195-202.
    [46] David M B, Andrew Y N, Michael I J. Latent dirichlet allocation [J], The Journal of MachineLearning Research,2003,3:993-1022.
    [47]石晶,范猛,李万龙.基于LDA模型的主题分析[J].自动化学报,2009,35(12):1586-1592.
    [48]张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802.
    [49]杨星,李保利,金明举.基于LDA模型的研究领域热点及趋势分析[J].计算机技术与发展,2012,22(10):66-69.
    [50]骆卫华,于满泉,许洪波等.基于多策略优化的分治多层聚类算法的话题发现研[J].中文信息学报,2006(1):29-35.
    [51] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[C].Communications of the ACM,1975,18(11):603-620.
    [52] Esuli A, Sebastiani F. SENTIWORDNET: A Publicly Available Lexical Resource forOpinion Mining [C]. In Proceedings of the5th Conference on Language Resources andEvaluation,2006,417-422.
    [53] Witten I, Frank E. Data Mining: Practical Machine Learning Tools and Techniques [B]. SanFransisco: Morgan Kaufmann Publishers,2005.
    [54] Huang Z. Extensions to the k-means algorithm for clustering large data sets with categoricalvalues [J]. Data Mining and Knowledge Discovery,1998,2(2):283-304.
    [55] Steinbach M, Karypis G, Kumara V.A Comparison of Document Clustering Techniques[C].KDD-2000Workshop on Text Mining, Boston MA USA.2000,109–110
    [56] Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters inlarge spatial databases[C]. In Proceedings of InternationalConference Knowledge Discoveryand Data Mining (KDD’96),1996,226-231.
    [57] Ankerst M, Breunig M, Kriegel HP, et al. Ordering points to identify the clustering structure[C]. In Proceeding of1999ACM-SIGMOD Int. Conf. Management of data (SIGMOD’99),1999,49-60.
    [58]孙晓明,马少平.基于写作风格的作者识别[C].中国中文信息学会二十周年学术会议.北京:清华大学出版社,2001:198-204.
    [59] Efron R, Thisted B. Estimating the number of unseen species: How many words didShakespeare know?[J]. Biometrika,1976,63(3):435–447.
    [60]张运良,朱礼军,乔晓东,张全.基于句类特征的作者写作风格分类研究[J].计算机工程与应用,2009,45(22):129-131.
    [61]张凯,张明允.基于SVM的《红楼梦》写作风格研究[J].贵阳学院学报,2011,6(1):55-57.
    [62] Gray A, Sallis P, MacDonell S. Software forensics: Extending authorship analysis techniquesto computer programs [C]. Paper presented at the3rd biannual conference of the InternationalAssociation of Forensic Linguists (IAFL'97).
    [63] De Vel O, Anderson A, Corney M, Mohay G. Mining e-mail content for author identificationforensics [J]. ACM SIGMOD Record,2001,30(4):55–64.
    [64] Zheng R, Li J, Huang Z, Chen H. A framework for authorship analysis of online messages:Writing-style features, techniques [J]. Journal of the american society for information scienceand technology,2006,57(3):378–393.
    [65] Abbasi A, Chen H. Identification and comparison of extremist-group Web forum messagesusing authorship analysis [J]. IEEE Intelligent Systems,2005,20(5):67-75.
    [66] Oman W P, Cook R C. Programming style authorship analysis [C]. In Proceedings of the17th Annual ACM Computer Science Conference,1989,320–326.
    [67] Hayne C S, Rice E R. Attribution accuracy when using anonymity in group support systems[J]. International Journal of Human-ComputerStudies,1997,47(3):429–452.
    [68] Krsul I, Spafford H E. Authorship analysis: Identifying the author of a program [J].Computers and Security,1997,16(3):233–257.
    [69] Stamatatos E, Fakotakis N, Kokkinakis G. Automatic text categorization in terms of genre,author [J]. Computational linguistics,2000,26(4):471–495.
    [70] Chaski C E. Empirical evaluation of language-based author identification techniques [J].Forensic Linguist,2001,8(1):1–65.
    [71] Bayyen R H, Halteren H V, Neijt A, Tweedie F J. An experiment in authorship attribution
    [C]. In Proceedings of the6th International Conference on Statistical Analysis of TextualData,2002,29-37.
    [72] Corney M., de Vel O, Anderson A, Mohay, G. Gender-Preferential text mining of emaildiscourse [C]. In Proceedings of the18th Annual Computer Security ApplicationsConference, Las Vegas, NV,2002,282-289.
    [73] Argamon S, Saric M, Stein S S. Style mining of electronic messages for multiple authorshipdiscrimination: First results[C]. In Proceedings of the9th International ACM SIGKDDConference,2003,475-480.
    [74] DiederichJ, KindermannJ, LeopolE, PaassG.Authorship attribution with support vectormachines [J]. Applied Intelligence,2003,19(1-2):109–123.
    [75] Hayne C S, Pollard E C, Rice E R. Identification of comment authorship in anonymous groupsupportsystems [J]. Journal of Management Information Systems,2003,20(1):301–329.
    [76] Koppel M, Schler J. Exploiting stylistic idiosyncrasies for authorship attribution [C]. InProceedings of the International Joint Conferences on Artificial Intelligence, Workshop onComputational Approaches to Style Analysis, Synthesis, Acapulco, Mexico,2003,69-72.
    [77] Ding H, Samadzaheh H M. Extraction of Java program fingerprints for software authorshipidentification [J]. Journal of Systems and Software,2004,72(1),49–57.
    [78] Whitelaw C, Argamon S. Systemic functional features in stylistic text classification [C]. InProceedings of the AAAI Symposium on Style, Meaning in Language, Art, Music, Design,Washington, DC,2004.
    [79] Chaski C E. Who’s at the keyboard? Authorship attribution in digital evidence investigation[J]. International Journal of Digital Evidence,2005,4(1):1–13.
    [80] Juola P, Baayen H. A controlled-corpus experiment in authorship identification bycross-entropy. Literary Linguist Computing,2005,20(suppl):59–67.
    [81] Abbasi A, Chen H. Visualizing authorship for identification [C]. In Proceedings of the4thIEEE Symposium on Intelligence, Security Informatics, San Diego, CA,2006,60-71.
    [82] Li J, Zheng R, Chen H. From fingerprint to writeprint [J]. Communications of the ACM,2006,49(4):76–82.
    [83] Holmes D I, Forsyth R. The Federalist revisited: New directions in authorship attribution [J].Literary and Linguistic Computing,1995,10(2):111–127.
    [84] Ledger G R, Merriam T V N. Shakespeare, Fletcher, and the two Noble Kinsmen [J]. Literaryand Linguistic Computing,1994,9:235–248.
    [85] Mendenhall T C. The Characteristic Curves of Composition [J]. Science, IX,1887,9:237–249.
    [86] Tweedie F J, Baayen R H. How variable may a constant be? Measures of lexical richness inperspective [J]. Computers and the Humanities,1998,32:323–352.
    [87] Baayen R H, Van H H, Tweedie F J. Outside the cave of shadows: Using syntactic annotationto enhance authorship attribution [J]. Literary and Linguistic Computing,1996,2:110–120.
    [88] Burrows J F. Statistical analysis and some major determinants of literary style [J]. Computersand the Humanities,1989,23:309–321.
    [89] Abbasi A, Chen H. Writeprints: A Stylemetric Approach to Identity-Level Identification andSimilarity Detection in Cyberspace [J]. ACM Transactions on Information System,2008,25(1):49–78.
    [90] Klemm P, Reppert K, Visich A. Nontraditional Cancer Support Group [J]. ComputerInformation Nursing,1998,16(1):31-36.
    [91] White M H, Marsha H, Dorman S M. Online Support for Caregivers: Analysis of an InternetAlzheimer Mailgroup [J]. Computers in Nursing,2000,18(4):168-179.
    [92] Grbich C, Parker D, Maddocks I. The emotions and coping strategies of caregivers of familymembers with terminal cancer [J]. Indian Journal of Palliative Care,2001,17(1):30–36.
    [93] Flaskerud J H, Carter P A, Lee P. Distressing emotions in female caregivers of people withAIDS, age-related dementias, and advanced-stage cancers [J]. PerspectPsychiatr Care,2000,36(4):121–130.
    [94] Alemi F,Torii M, Clementz L, Aron DC. Feasibility of real-time satisfaction surveys throughautomated analysis of patients' unstructured comments and sentiments [J]. QualityManagement in Health Care,2012,21(1):9-19.
    [95]陈博. Web文本情感分类中关键问题的研究[D].北京:北京邮电大学博士论文,2008.
    [96]王洪伟,刘勰,尹裴等. Web文本情感分类研究综述[J].情报学报,2010,29(5):931-938.
    [97] Hatzivassiloglou V, Wiebe J.Effects of Adjective Orientation and Tradability on SentenceSubjectivity[C]. In Proceedings of the18th International Conference on ComputationalLinguistics, New Brunswick, NJ, USA,2000,299-305.
    [98] Yu H, Hatzivassiloglou V. Towards Answering Opinion Questions: Separating Facts fromOpinions and Identifying the Polarity of Opinion Sentences [C]. In Proceedings of the2003conference on Empirical methods in natural language processing, Sapporo, Japan,2003,129-136.
    [99] Wilson T, Wiebe J, Hoffmann P. Recognizing Contextual Polarity in Phrase-LevelSentiment Analysis [C]. In Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language Processing, Vancouver, Canada,2005,347-354.
    [100] Turney P D, Littman M L. Unsupervised Learning of Semantic Orientation from aHundred-Billion-Word Corpus [EB/OL]. Technical Report ERC-1094(NRC44929),National Research Council of Canada,2002.
    [101] Pang B, Lee L. A Sentimental Education: Sentiment Analysis using SubjectivitySummarization based on Minimum Cuts [C]. In Proceedings of the42nd Annual Meeting onAssociation for Computational Linguistics Barcelona, Spain,2004,271-278.
    [102] Yi J, Nasukawa T, Bunescu R, et al. Sentiment Analyzer: Extracting Sentiments about AGiven Topic using Natural Language Processing Techniques [C]. InProceedings of ThirdIEEE International Conference,2003,427-434.
    [103] Turney P D. Thumbs Up o r Thumbs Down? Semantic Orientation Applied toUnsupervised Classification of Reviews [C]. In Proceedings of the40th Annual Meeting ofthe Association for Computational Linguistics, USA,2002,417-424.
    [104] Cheon N J, Khoo C, Wu P H. Use of negation phrases in automatic sentimentclassifieation of Product reviews[J]. Library Collections, Acquisitions and Technical Services,2005,29(2):180-191.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700