基于文本挖掘的性别分类研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
网络技术的产生和发展给人们的生活带来了翻天覆地的变化,也给原有的生活模式,商业模式和沟通模式带来了更多的变化模式,人们可以通过各种形式的交流工具实现即时的沟通。那么这种非语言的沟通模式中是否存在着与语言环境下的相同的性别差异,是否可以通过沟通内容进行性别分类进而总结出网络环境下或这种非语言环境下男性和女性特定的词汇库。
     不同性别用户在网络环境下的差异不仅包括用户对计算机态度以及使用意图等的不同还包括了网络购物带来的差异包括风险意识,服务,质量等的评估,推荐机制等。更重要的是,这种非面对面的交流沟通模式将面对面交流中可能被隐藏的差异表现的更为明显,包括词语的使用,语气的表达和情感表述等。
     为了验证在网络沟通模式下,语言运用中性别的差异,在文章研究中选取天涯汽车和股票论坛的留言内容作为研究数据,以假设检验的方法对用户的发帖行为进行验证,发现不同性别用户中存在的行为差异。利用几种文本分类算法对评论内容的分类结果证明可以通过评论内容进行性别分类研究找出分类效率较好的分类算法,并通过关联规则挖掘算法发现不同性别用户对产品属性的关注点不同,为相关厂家的营销策略的制定提供参考依据,最后利用词语相似度计算公式结合特征提取及互信息等方法构建男性和女性的词典。
People’s lives have been changed a lot with the development of the networktechnology, so do the original life style, and business model and communicationmodel.People trough various forms of communication tools for instant communication.So this non-verbal communication patterns hava similar difference in gender with theverbal communication.Is it possible to classifty the gender trough the text classification,and sum up the male and female specific word net in the non-verbal communicationenviornment.What’s more, the non-verbal communication partterns enlarge thedifference of those hidend by the people in ordinary life by the usage of terms and theway of expressing the feeling and so on.
     Gender difference in the network environment includes not only the attitude andwillingness of usage to computer but also the shopping behavior in electroniccommerce.In order to verify the existence of the gender difference in the non-verbalcommunication enviornment,I select the content of the message form car and stockforum in TianYa,and with the approach of hyposis test verify the user’s postingbehavior,with some kind of text classification method classify the gender,and by theassociation rule mining approach extract the feature word of the car and stock of themale and female.By this feature words,the related manufactures can formulate themarketing strategy.Through the similarity of words and mutual information of featureestabilish the dictionary to gender.
引文
[1] Susan Herring, Anna Martinson.Assessing Gender Authenticity inComputer-mediated Language Use[J].Journal of language and socialpsychology,2004,4(24):424-446.
    [2] Amy Bruckman.Gender Swapping on TheInternet[J].2004,http://www.cc.gatech.edu/elc/papers/bruckman/gender-swappingbruckman.pdf
    [3] Simon McRae, Huyen Tran, Sam Schulman, Jeff Ginsberg, Clive Kearon. Effectof Patient's Sex on Risk of Recurrent Venous Thromboembolism: A Meta-analysis[J]. ScienceDirect, 2006(368):371-378
    [4] Susan Herring. Bridging the Gap: A Genre Analysis of Weblogs[C]. Proceedingsof the 37th Annual Hawaii International Conference on System SciencesHICSS'04,2004:1-11.
    [5] Joseph Walther.Computer-mediated communication: Impersonal, interpersonaland hyperpersonal interaction [J]. Communication Research, 1996: 23, 3-43.
    [6] Susan Herring.Gender and Participation in Computer-mediated LinguisticDiscourse [C]. The Annual Meeting of the Linguistic Society ofAmerica:1992:147-170
    [7]张燕芗.性别差异在语言使用中的体现[D].福建:福建师范大学,2005,10-21
    [8]余国良.对性别与语言关系的三种解释理论的评价[J].西安外国语学院学报,2005,13(2):15-17
    [9]刘莹.浅谈性别差异在语言中的体现及其原因[J].科技信息,2009,11:255-256
    [10]樊斌.基于汉语语料库的性别词汇研究[D].武汉:武汉理工大学,2005,5-10
    [11]江红旗.汉语语言应用中的性别差异研究[D].武汉:华中师范大学,2007,6-25
    [12] Ono Hiroshi, Madeline Zavodny. Gender and the Internet [J]. Social ScienceQuarterly, 2003.84(1):111–121.
    [13] Eszter Hargittai, Steven Shafer.Differences in Actual and Perceived Online Skills:the role of gender [J].Social science quarterly, 2006, 2(87):432-448
    [14] Lindsay Shaw, Larry Gant. Users Divided? Exploring the Gender Gap in InternetUse [J]. Cyberpsychology and Behavior, 2002, 6(5):517-527.
    [15] Margarete Imhof,Regina Vollmeyer,Constanze Beierlein.Computer Use and TheGender Gap:The Issue of Access,Use,Motivation,andPerformance[J].Computers inhuman behavior,2007,23:2823-2837.
    [16] Seung Youn (Yonnie) Chyung.Age and Gender Differences in Online Behavior,Self-efficacy and Academic Performance[J].The quarterly review of distanceeducation,2007,3(8):213-222.
    [17] Bonka Boneva, Robert Kraut, David Frohlich. Using E-Mail for PersonalRelationships [J]. American Behavioral Scientist, 2001, 3(15):530-594.
    [18] Annie Fox, Danuta Bukatko, Mark Hallahan, Mary Crawford. The Medium Makesa Difference Gender Similarities and Differences in Instant Messaging [J]. Journalof Language and Social Psychology, 2007, 4(26):389-307.
    [19] Andrew Flanagin, Vanessa Tiyaamornwong, Joan O'Connor, David Seibold.Computer-mediated Group Work: The Interaction of Member Sex and Anonymity[J]. Communication Research, 2002:29, 66-93.
    [20] Victor Savicki, Merle kelley.Computer Mediated Communication: Gender andGroup Composition [J].Cyberpsycholog&behavior, 2000, 5(3):817-826.
    [21] Rob Thomson, Tamar Murachver. Predicting gender from electronicdiscourse.[J] .British Journal of Social Psychology,2001,40: 193-208.
    [22] Tayfun Kucukyilmaz, Barla Cambazoglu, Cevdet Aykanat , Fazli Can.Chatmining for gender prediction[J].Computer Science,2006(4243),274-283.
    [23] Ahmed Abbasi, Hsinchun Chen. Visualizing Authorship Identification[J].Computer Science, 2006(3975):60-71.
    [24]中国互联网信息中心.《第26次中国互联网发展状况统计报告》.
    [25] Eszter Hargittai, Steven Shafer. Differences in Actual and PerceivedOnline Skills:The Role of Gender [J].Social Science Quarterly, 2006, 87:432-448
    [26] Ellen Garbarino, Michal Strahilevitz.Gender differences in the Perceived Risk ofBuying Online and The Effects of Receiving a Site Recommendation [J]. Journalof Business Research, 2004(54): 768– 775.
    [27] Susan Herring.Posting in Different Voice: Gender and Ethics inComputer-mediated Communication [J].Philosophical Perspectives onComputer-mediated Communication, 1996:115-145
    [27] Kara Arnold, Constanza Bianchi.Relationship Marketing, Gender, and Culture:Implications for Consumer Behavior [J]. Advances in Consumer Research,2001(28)100-105.
    [28]牛海根.文本挖掘技术在短信文本中的应用研究[D].成都:电子科技大学,2007:14-24.
    [29]杨丽华,戴齐,杨占华.文本分类技术研究.软件时空,2006(22):209-211.
    [30]谭松波.高性能文本分类算法研究[D].北京:中国科学院计算技术研究所,2006,15-20.
    [31]刘华.文本分类相似模型和概率模型的实现与比较[J].现代图书情报技术,2006(4):53-55.
    [32]王玉玲,王娟.文本分类中特征提取算法.孝感学院学报,2003(6):35-37.
    [33]何晓群.多元统计分析[M].北京:中国人民大学出版社,2009:291-301.
    [34]韩家炜.数据挖掘概念及技术[M].江西南昌:机械工业出版社,2008:184-250.
    [35] Robin Lakoff.Language and Women’s Place [J].Language in society,1973(2):45-79.
    [36] Cheris Kramer.Perceptions of Female and Male Speech [J]. Language and Speech,1977(20):151-161.
    [37] Nancy Britons,Judith Hall.Beliefs About Female And Male NonverbalCommunication[J].Sex Roles,1995(23):79-90.
    [38] Mehrabian, Albert.Singnificance of Posture and Position in The Communication ofAttitude and Status Relationships [J].Psychological Bulletion, 1969(14):324-336.
    [39] Deborah Tannen.You Just don’t Understand: Women and Men in Conversation[J].Sociel interaction in everyday life, 1990:210-225.
    [40] Carol Gilligan In a Different Vioce:Psychological Theory and Women’sDevelopment.[J].The personalist forum, 1986(2):150-152.
    [41]孙宁青.基于神经网络和CFS特征选择的网络入侵检测系[J].计算机工程与科学,2010(6):37-40.
    [42]姚志明.基于步态触觉信息的身份识别研究[D].安徽:中国科学技术大学,2010:107-110.
    [43]李实,李一军.中文网络客户评论中的产品特征挖掘方法研究[D].哈尔滨:哈尔滨工业大学,2009:49-69.
    [44]刘群,李素建.基于《知网》的词汇语义相似度的计算[J].中文计算语言学,2002, 17(2): 59-76.
    [45]柳位平,朱艳辉,栗春亮,向华政,文志强.中文基础情感词词典构建方法研究[J].计算机应用,2009,29(10):2875-2877.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700