微博热点事件的公众情感分析研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
微博客(以下简称:微博)是一种新兴的互联网社交媒体,用户以短文本形式实时交流个人见闻与观点,对社会事件表达情感倾向。从微博环境中发现热点事件并进行情感分析,正确评价网民的舆论,具有重要的现实意义。但针对微博的分析比较困难:微博文本长度短、内容多样性强,表达形式自由,语言较不规范。因此在微博中开展热点事件发现与情感分析的研究,具有较突出的研究意义。本文的主要工作包括:
     将微博话题标签行为作为发现微博热点事件的线索,通过标签分类完成微博热点事件的发现。本方法提出不稳定性程度、在线话题可能性程度与标签作者信息熵这三种度量,利用标签反映微博主题而又不依赖于文本内容的特点,识别微博中的突发热点事件、流行在线话题或广告营销内容。这一方法克服了传统突发检测方法完全基于数值变化、话题检测方法依赖文本语义的不足,在多语言微博环境中通过分类发现与真实社会事件相关的突发热点事件,去除流行在线话题或广告营销内容带来的噪声。实验表明本方法对微博话题的分类性能优于已有的标签分类算法
     针对微博中表达情感的基本形式,提出基于情感记号的情感词典构造与情感分析方法。此类记号包括广义表情符号、重复标点现象、重复字母词等多种情感表达单元,适用于互联网非正式文本的环境。利用情感记号在微博文本中的同现关系,可通过迭代传播方式自动构造情感词典。与传统方法相比,本方法利用多主题微博中特有的情感记号,不限于单一语言或领域的传统情感词,实验结果显示其在多语言微博的情感分类任务中取得更优性能。
     面向事件类中文微博的特点,结合新词发现的方法构造适用于这一微博环境的情感词典,并完成事件的情感分析。该方法弥补了基于正式文本的分词工具应用于非正式文本时存在的不足,计算出网络新词、表情图标、错写词语以及名词实体的情感倾向,体现网络环境中旧词的新义以及网民对实体的评价,这是传统情感词典并未涵盖的。实验结果表明结合微博情感词典的情感分类结果有所提升。此外,采用不同种子词可构造不同情绪(如喜悦、愤怒、悲哀、恐惧、惊讶等)的情感词典,不限于传统的褒贬二类倾向,使微博文本的情感分析任务目标更加细致多样。
Microblogging service is a new social medium on the Internet. Microblog usersshare their personal experiences and opinions in short texts, and express their attitudestowards social events. Therefore, it is important to discover popular and breaking eventsand analyze the public sentiments of them in microblogs. However, microblog analysis isdifficult: The texts of microblogs are short with diverse topics, and are expressed in freeforms which are usually informal. Thus, it is valuable to conduct academic research onpublic sentiment analysis of events in microblogs. The main contributions of this thesisinclude:
     Using hashtags as clues to discover breaking events in Microblogs. This methodintroduces three measurements, including Instability, Twitter Meme Possibility and Au-thorship Entropy on hashtags, which are closely relevant to the topic of the texts but notdependent on the words. By classifying the hashtags, the method recognizes breakingpopular events relevant to some real social events, removes the noises brought by onlinetopics and advertisements in multilingual microblog messages. It overcomes traditionalburst detection methods which ignore the text contents, as well as topic detection meth-ods which rely on the semantic information. Experimental results show that it achieves ahigher classification performance than other hashtag classification methods.
     Using emotion tokens as sentiment units to construct a sentiment lexicon forsentiment analysis. The emotion tokens include emotion symbols, repeating lettersand repeating punctuations, which frequently occur in informal Internet texts. Their co-occurrences can be utilized to automatically construct sentiment lexicons by label propa-gation algorithms. Comparing with traditional methods, the proposed method makes useof the emotion tokens typically in microblogs. It is not restricted to any single language ordomain, thus behaves better in multilingual microblog sentiment analysis as experimentalresults have indicated.
     Facing the characteristics of event-related Chinese microblogs, constructingChinese microblog sentiment lexicons with out-of-vocabulary (OOV) words discov-ery methods, which are used for sentiment analysis of events. The proposed methodreduces the errors brought by traditional word segmentation tools and semantic depen-dencies. It discovers the sentiment polarities of OOV words, animated emotional icons, misspelled words and named entities, which are formed by the public opinions from themicroblog users but are typically excluded in traditional sentiment lexicons. Experimen-tal results show that the performances are higher when considering the entries from theconstructed sentiment lexicon. Besides, this method can be applied to construct lexiconswith more dimensions (such as happiness, anger, sadness, fear and surprise) other thanthe only positive and negative sentiments.
引文
[1] Twitter Blog. Celebrating#Twitter7[EB/OL],(2013-03-21)[2013-04-08]. http://blog.twitter.com/2013/03/celebrating-twitter7.html.
    [2] Jude. China Total Active Twitter Users Exceed U.S.[EB/OL],(2012-10-10)[2013-04-10].http://www.chinainternetwatch.com/1707/china-total-twitter-users/.
    [3]田志凌. Twitter时代:人人都可发新闻[N/OL].南方都市报,2009-07-12(GB26)[2013-03-26]. http://gcontent.oeeee.com/3/32/3323fe11e9595c09/Blog/198/dabe6f.html.
    [4] Michelle, Uking. Special: Micro blog’s macro impact [EB/OL],(2011-03-02)[2013-04-08].http://www.chinadaily.com.cn/china/2011-03/02/content_12099500.htm.
    [5]新浪科技.新浪发布2010年四季及全年财报微博用户数过亿[EB/OL],(2011-03-02)
    [2013-04-08]. http://tech.sina.com.cn/i/2011-03-02/06005233783.shtml.
    [6] Resonance Team. Sina Commands56%of China’s Microblog Market[EB/OL],(2011-03-30)[2013-04-08]. http://www.resonancechina.com/2011/03/30/sina-commands-56-of-chinas-microblog-market/.
    [7]中国互联网络信息中心.中国互联网络发展状况统计报告(2013年1月)[R],2003.
    [8]易观智库.行业数据:2011年第4季度中国微博市场活跃用户规模达2.49亿[EB/OL],(2012-03-09)[2013-04-10]. http://www.enfodesk.com/SMinisite/maininfo/articledetail-id-316928.html.
    [9] Jansen B J, Zhang M, Sobel K, et al. Micro-blogging as online word of mouth branding. Pro-ceedings of CHI’09Extended Abstracts on Human Factors in Computing Systems, New York,NY, USA: ACM,2009.3859–3864.
    [10] JansenBJ,ZhangM,SobelK,etal. Twitterpower:Tweetsaselectronicwordofmouth. Journalof the American society for information science and technology,2009,60(11):2169–2188.
    [11] Yue S, Xuecheng Y. The potential marketing power of microblog. Proceedings of2010SecondInternational Conference on Communication Systems, Networks and Applications (ICCSNA),volume1,2010.164–167.
    [12] Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of ComputationalScience,2011,2(1):1–8.
    [13] Vu T T, Chang S, Ha Q T, et al. An Experiment in Integrating Sentiment Features for TechStock Prediction in Twitter. Proceedings of24th International Conference on ComputationalLinguistics,2012.23.
    [14] Bar-Haim R, Dinur E, Feldman R, et al. Identifying and following expert investors in stockmicroblogs. Proceedings of the Conference on Empirical Methods in Natural Language Pro-cessing, Stroudsburg, PA, USA: Association for Computational Linguistics,2011.1310–1319.
    [15] Tumasjan A, Sprenger T O, Sandner P G, et al. Predicting elections with twitter: What140characters reveal about political sentiment. Proceedings of the fourth International AAAI Con-ference on Weblogs and Social Media,2010.178–185.
    [16] Larsson A O, Moe H. Studying political microblogging: Twitter users in the2010Swedishelection campaign. New Media&Society,2012,14(5):729–747.
    [17] Bruns A, Burgess J E.#ausvotes: How Twitter Covered the2010Australian Federal Election.Communication, Politics and Culture,2011,44(2):37–56.
    [18] Cuzán A G. Forecasting the2012Presidential Election with the Fiscal Model. PS: PoliticalScience&Politics,2012,45(04):648–650.
    [19] Pennebaker J W, Francis M E, Booth R J. Linguistic inquiry and word count: LIWC2001,2001.
    [20] Grossman L. Iran Protests: Twitter, the Medium of the Movement [EB/OL],(2009-06-17)[2013-06-04]. http://www.time.com/time/world/article/0,8599,1905125,00.html.
    [21] Yang J, Counts S. Predicting the speed, scale, and range of information diffusion in twitter.ProceedingsoftheFourthInternationalAAAIConferenceonWeblogsandSocialMedia,2010.355–358.
    [22] Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users: real-time event detectionby social sensors. Proceedings of the19th International Conference on World Wide Web, NewYork, NY, USA: ACM,2010.851–860.
    [23] Qu Y, Huang C, Zhang P, et al. Microblogging after a major disaster in China: a case study ofthe2010Yushu earthquake. Proceedings of the ACM2011conference on Computer SupportedCooperative Work. ACM,2011.25–34.
    [24] Chatfield A T, Brajawidagda U. Twitter Early Tsunami Warning System: A Case Study inIndonesia’s Natural Disaster Management. Proceedings of System Sciences (HICSS),201346th Hawaii International Conference on. IEEE,2013.2050–2060.
    [25] Kleinberg J. Bursty and hierarchical structure in streams. Data Mining and Knowledge Dis-covery,2003,7(4):373–397.
    [26] AhmedM,SpagnaS,HuiciF,etal. Apeekintothefuture:predictingtheevolutionofpopularityin user generated content. Proceedings of the sixth ACM International Conference on WebSearch and Data Mining. ACM,2013.607–616.
    [27] Yang J, Leskovec J. Patterns of temporal variation in online media. Proceedings of the fourthACM international conference on Web search and data mining. ACM,2011.177–186.
    [28] GruhlD,GuhaR,Liben-NowellD,etal. Informationdiffusionthroughblogspace. Proceedingsof the13th international conference on World Wide Web. ACM,2004.491–501.
    [29] Figueiredo F. On the prediction of popularity of trends and hits for user generated videos.Proceedings of the sixth ACM international conference on Web search and data mining. ACM,2013.741–746.
    [30] Zubiaga A, Spina D, Fresno V, et al. Classifying trending topics: a typology of conversationtriggers on Twitter. Proceedings of the20th ACM international conference on Information andknowledge management, New York, NY, USA: ACM,2011.2461–2464.
    [31] Tu H, Ding J. An Efficient Clustering Algorithm for Microblogging Hot Topic Detection.Proceedingsof2012InternationalConferenceonComputerScience&ServiceSystem(CSSS).IEEE,2012.738–741.
    [32] Pervin N, Fang F, Datta A, et al. Fast, Scalable, and Context-Sensitive Detection of TrendingTopics in Microblog Post Streams. ACM Transactions on Management Information Systems(TMIS),2013,3(4):19.
    [33] Mathioudakis M, Koudas N. TwitterMonitor: trend detection over the twitter stream. Proceed-ings of the2010international conference on Management of data, New York, NY, USA: ACM,2010.1155–1158.
    [34] Pavlyshenko B. Data Mining of the Concept”End of the World” in Twitter Microblogs. CoRR,2013, abs/1302.2131.
    [35] Liu B. Sentiment analysis and subjectivity. Handbook of natural language processing,2010,2:568.
    [36] Pang B, Lee L. Opinion mining and sentiment analysis. Foundations and trends in informationretrieval,2008,2(1-2):1–135.
    [37] Pang B, Lee L, Vaithyanathan S. Thumbs up?: sentiment classification using machine learningtechniques. Proceedings of the ACL-02conference on Empirical methods in natural languageprocessing-Volume10. Association for Computational Linguistics,2002.79–86.
    [38] Martín-Valdivia M T, Montejo-Ráez A, Ure a-López A, et al. Learning to Classify NeutralExamplesfromPositiveandNegativeOpinions. Journalof UniversalComputerScience,2012,18(16):2319–2333.
    [39] Jiang L, YuM, Zhou M,et al. Target-dependenttwitter sentiment classification. Proceedings ofthe49th Annual Meeting of the Association for Computational Linguistics: Human LanguageTechnologies, volume1,2011.151–160.
    [40] Bifet A, Frank E. Sentiment knowledge discovery in twitter streaming data. Proceedings ofDiscovery Science. Springer,2010.1–15.
    [41] Davidov D, Tsur O, Rappoport A. Enhanced sentiment learning using twitter hashtags and smi-leys. Proceedings of the23rd International Conference on Computational Linguistics: Posters.Association for Computational Linguistics,2010.241–249.
    [42] GoA,BhayaniR,HuangL. Twittersentimentclassificationusingdistantsupervision. CS224NProject Report, Stanford,2009.1–12.
    [43] Pak A, Paroubek P. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In: Calzo-lari N, Choukri K, Maegaard B, et al.,(eds.). Proceedings of the Seventh International Confer-ence on Language Resources and Evaluation (LREC’10), Valletta, Malta: European LanguageResources Association (ELRA),2010.
    [44] Boiy E, Moens M F. A machine learning approach to sentiment analysis in multilingual Webtexts. Information retrieval,2009,12(5):526–558.
    [45] Bautin M, Vijayarenu L, Skiena S. International sentiment analysis for news and blogs. Pro-ceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM).AAAIPress,2008.19–26.
    [46] Denecke K. Using SentiWordNet for multilingual sentiment analysis. Proceedings of DataEngineering Workshop,2008. ICDEW2008. IEEE24th International Conference on. IEEE,2008.507–512.
    [47] Banea C, Mihalcea R, Wiebe J. Multilingual subjectivity: are more languages better? Pro-ceedings of the23rd International Conference on Computational Linguistics. Association forComputational Linguistics,2010.28–36.
    [48] Strapparava C, Mihalcea R. Learning to identify emotions in text. Proceedings of the2008ACM symposium on Applied computing, New York, NY, USA: ACM,2008.1556–1560.
    [49] Neviarouskaya A, Prendinger H, Ishizuka M. SentiFul: A lexicon for sentiment analysis. IEEETransactions on Affective Computing,2011,2(1):22–36.
    [50] Turney P D. Thumbs up or thumbs down?: semantic orientation applied to unsupervised clas-sification of reviews. Proceedings of the40th annual meeting on association for computationallinguistics. Association for Computational Linguistics,2002.417–424.
    [51] KimSM,HovyE. Determiningthesentimentofopinions. Proceedingsofthe20thinternationalconference on Computational Linguistics. Association for Computational Linguistics,2004.1367.
    [52] Hu M, Liu B. Mining and summarizing customer reviews. Proceedings of the tenth ACMSIGKDD international conference on Knowledge discovery and data mining. ACM,2004.168–177.
    [53] Hatzivassiloglou V, McKeown K R. Predicting the semantic orientation of adjectives. Pro-ceedings of the eighth conference on European chapter of the Association for ComputationalLinguistics. Association for Computational Linguistics,1997.174–181.
    [54] Serban O, Pauchet A, Rogozan A, et al. Semantic Propagation on Contextonyms using Senti-WordNet. Proceedings of the Workshop Affect, Artificial Companion, Interaction, Grenoble,France,2012.86–93.
    [55] Carmen Banea R M, Wiebe J. A Bootstrapping Method for Building Subjectivity Lexicons forLanguages with Scarce Resources. In: Calzolari N, Choukri K, Maegaard B, et al.,(eds.).Proceedings of Proceedings of the Sixth International Conference on Language Resourcesand Evaluation (LREC’08), Marrakech, Morocco: European Language Resources Association(ELRA),2008. http://www.lrec-conf.org/proceedings/lrec2008/.
    [56] Baccianella S, Esuli A, Sebastiani F. SentiWordNet3.0: An Enhanced Lexical Resource forSentiment Analysis and Opinion Mining. In: Calzolari N, Choukri K, Maegaard B, et al.,(eds.). Proceedings of Proceedings of the Seventh International Conference on Language Re-sourcesandEvaluation(LREC’10),Valletta,Malta:EuropeanLanguageResourcesAssociation(ELRA),2010.
    [57] Velikovich L, Blair-Goldensohn S, Hannan K, et al. The viability of web-derived polaritylexicons. Proceedings of Human Language Technologies: The2010Annual Conference ofthe North American Chapter of the Association for Computational Linguistics. Association forComputational Linguistics,2010.777–785.
    [58] Zhu X, Ghahramani Z. Learning from labeled and unlabeled data with label propagation. Tech-nical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University,2002.
    [59] Moh T S, Murmann A J. Can you judge a man by his friends?-Enhancing spammer detec-tion on the Twitter microblogging platform using friends and followers. Information Systems,Technology and Management,2010.210–220.
    [60] Wang A H. Don’t follow me: Spam detection in twitter. Proceedings of the2010InternationalConference on Security and Cryptography (SECRYPT). IEEE,2010.1–10.
    [61] Yardi S, Romero D M, Schoenebeck G, et al. Detecting Spam in a Twitter Network. FirstMonday,2010,15.
    [62] LeeK,CaverleeJ,WebbS. Uncoveringsocialspammers:socialhoneypots+machinelearning.Proceedings of Research and Development in Information Retrieval,2010.435–442.
    [63] Stringhini G, Kruegel C, Vigna G. A Study on Social Network Spam. Graduate Student Work-shop on Computing,2010.43.
    [64] Shekar C, Wakade S, Liszka K J, et al. Mining pharmaceutical spam from Twitter. Proceedingsof Intelligent Systems Design and Applications,2010.813–817.
    [65] Benevenuto F, Magno G, Rodrigues T, et al. Detecting spammers on twitter. Proceedings ofAnnualCollaboration,Electronicmessaging,Anti-AbuseandSpamConference(CEAS),2010.
    [66] Wikipedia. Levenshtein distance [EB/OL],(2013-03-19)[2013-03-25]. http://en.wikipedia.org/w/index.php?title=Levenshtein_distance&oldid=545341326.
    [67] Castillo C, Mendoza M, Poblete B. Information credibility on twitter. Proceedings of the20thinternational conference on World wide web, New York, NY, USA: ACM,2011.675–684.
    [68] Kotov A, Zhai C, Sproat R. Mining named entities with temporally correlated bursts frommultilingual web news streams. Proceedings of the fourth ACM international conference onWeb search and data mining, New York, NY, USA: ACM,2011.237–246.
    [69] Kwak H, Lee C, Park H, et al. What is Twitter, a social network or a news media? Proceedingsof the19th international conference on World wide web, New York, NY, USA: ACM,2010.591–600.
    [70] Phuvipadawat S, Murata T. Breaking News Detection and Tracking in Twitter. Proceedings ofthe2010IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent AgentTechnology-Volume03, Washington, DC, USA: IEEE Computer Society,2010.120–123.
    [71]司宪策.基于内容的社会标签推荐与分析研究[博士学位论文].北京:清华大学计算机科学与技术系,2010.
    [72] GolderS,HubermanB. Usagepatternsofcollaborativetaggingsystems. Journalofinformationscience,2006,32(2):198.
    [73] Marlow C, Naaman M, Boyd D, et al. HT06, tagging paper, taxonomy, Flickr, academic article,to read. Proceedings of the seventeenth conference on Hypertext and hypermedia, New York,NY, USA: ACM,2006.31–40.
    [74] Brooks C, Montanez N. An analysis of the effectiveness of tagging in blogs. Proceedings ofComputation Approaches to Analyzing Weblogs. Papers from the2006AAAI Spring Sympo-sium. AAAI Press, Menlo Park, CA,2006.9–15.
    [75] Li R, Bao S, Yu Y, et al. Towards effective browsing of large scale social annotations. Pro-ceedingsofthe16thinternationalconferenceonWorldWideWeb,NewYork,NY,USA:ACM,2007.943–952.
    [76] PapakonstantinouC,PanagiotouI,VerbeekF. TheTicTagapplication:towardstag-basedmeta-searchforbrowsingtheweb. Proceedingsofthe23rdBritishHCIGroupAnnualConferenceonPeople and Computers: Celebrating People and Technology, Swinton, UK, UK: British Com-puter Society,2009.354–361.
    [77] Zhou D, Bian J, Zheng S, et al. Exploring social annotations for information retrieval. Pro-ceedingsofthe17thinternationalconferenceonWorldWideWeb,NewYork,NY,USA:ACM,2008.715–724.
    [78] Li X, Guo L, Zhao Y E. Tag-based social interest discovery. Proceedings of the17th interna-tional conference on World Wide Web, New York, NY, USA: ACM,2008.675–684.
    [79] Shepitsen A, Gemmell J, Mobasher B, et al. Personalized recommendation in social taggingsystems using hierarchical clustering. Proceedings of the2008ACM conference on Recom-mender systems, New York, NY, USA: ACM,2008.259–266.
    [80] Laniado D, Mika P. Making sense of twitter. The Semantic Web–ISWC2010,2010.470–485.
    [81] Chang H C. A new perspective on twitter hashtag use: diffusion of innovation theory. Proceed-ings of the American Society for Information Science and Technology,2010,47(1):1–4.
    [82] Romero D M, Meeder B, Kleinberg J. Differences in the mechanics of information diffusionacross topics: idioms, political hashtags, and complex contagion on twitter. Proceedings of the20thinternationalconferenceonWorldwideweb,NewYork,NY,USA:ACM,2011.695–704.
    [83] Cunha E, Magno G, Comarela G, et al. Analyzing the Dynamic Evolution of Hashtags onTwitter: a Language-Based Approach. Proceedings of the Workshop on Language in SocialMedia (LSM2011), Portland, Oregon, USA: Association for Computational Linguistics,2011.58–65.
    [84] Tsur O, Rappoport A. What’s in a Hashtag? Content based Prediction of the Spread of Ideas inMicroblogging Communities. Proceedings of the fifth ACM international conference on Websearch and data mining,2012.643–652.
    [85] P schko J. Exploring Twitter Hashtags. CoRR,2011, abs/1111.6553.
    [86] Wang A, Chen T, Kan M Y. Re-tweeting from a Linguistic Perspective. NAACL-HLT2012,2012.46.
    [87] Yao J, Cui B, Huang Y, et al. Detecting bursty events in collaborative tagging systems. Pro-ceedings of the26th International Conference on Data Engineering (ICDE),2010.780–783.
    [88] Huang J, Thornton K M, Efthimiadis E N. Conversational tagging in twitter. Proceedings ofthe21st ACM conference on Hypertext and hypermedia, New York, NY, USA: ACM,2010.173–178.
    [89] Carter S, Tsagkias M, Weerkamp W. Twitter hashtags: Joint Translation and Clustering. Pro-ceedings of the ACM WebSci’11,2011.1–3.
    [90] Cantador I, Konstas I, Jose J M. Categorising social tags to improve folksonomy-based rec-ommendations. Web Semantics: Science, Services and Agents on the World Wide Web,2011,9(1):1–15.
    [91] Howard P N. The Cascading effects of the Arab spring [EB/OL],(2011-02-23)[2013-06-04].http://www.psmag.com/politics/the-cascading-effects-of-the-arab-spring-28575/.
    [92] Metzler D, Croft W B. Combining the language model and inference network approaches toretrieval. Information processing&management,2004,40(5):735–750.
    [93] Aw A, Zhang M, Xiao J, et al. A phrase-based statistical model for SMS text normalization.Proceedings of the COLING/ACL on Main conference poster sessions. Association for Com-putational Linguistics,2006.33–40.
    [94] Choudhury M, Saraf R, Jain V, et al. Investigation and modeling of the structure of textinglanguage. International Journal on Document Analysis and Recognition,2007,10(3):157–174.
    [95] Kaufmann M, Kalita J. Syntactic normalization of Twitter messages. Proceedings of Interna-tional Conference on Natural Language Processing, Kharagpur, India,2010.
    [96] Brody S, Diakopoulos N. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengtheningto detect sentiment in microblogs. Proceedings of the Conference on Empirical Methods inNatural Language Processing. Association for Computational Linguistics,2011.562–570.
    [97] MaedaH,ShimadaK,EndoT. TwitterSentimentAnalysisBasedonWritingStyle. Proceedingsof Advances in Natural Language Processing. Kanazawa, Japan: Springer,2012:278–288.
    [98] Ptaszynski M, Maciejewski J, Dybala P, et al. Cao: A fully automatic emoticon analysis systembased on theory of kinesics. Affective Computing, IEEE Transactions on,2010,1(1):46–59.
    [99] Williams A.(-: Just Between You and Me;-)[N/OL]. The New York Times,2007-07-29[2013-04-15]. http://www.nytimes.com/2007/07/29/fashion/29emoticon.html.
    [100] Barbosa L, Feng J. Robust sentiment detection on twitter from biased and noisy data. Proceed-ings of the23rd International Conference on Computational Linguistics: Posters. Associationfor Computational Linguistics,2010.36–44.
    [101]谢丽星.基于SVM的中文微博情感分析的研究[硕士学位论文].北京:清华大学计算机科学与技术系,2011.
    [102] Beineke P, Hastie T, Vaithyanathan S. The sentimental factor: Improving review classificationvia human-provided information. Proceedings of the42nd Annual Meeting on Association forComputational Linguistics. Association for Computational Linguistics,2004.263.
    [103] Zhang H P, Yu H K, Xiong D Y, et al. HHMM-based Chinese lexical analyzer ICTCLAS.Proceedings of the second SIGHAN workshop on Chinese language processing-Volume17,Stroudsburg, PA, USA: Association for Computational Linguistics,2003.184–187.
    [104] Li Z, Zhang M, Ma S, et al. Automatic Extraction for Product Feature Words from Commentson the Web. Proceedings of the5th Asia Information Retrieval Symposium on InformationRetrieval Technology, Berlin, Heidelberg: Springer-Verlag,2009.112–123.
    [105] Information Retrieval Group of Tsinghua University. Development of the Technique for Chi-nese Document Categorization, First Year Report. Technical report, Tsinghua University,September,2012.
    [106]张伟,刘缙,郭先珍.学生褒贬义词典.北京:中国大百科全书出版社,2004.
    [107]知网.情感分析用词语集(beta版)[EB/OL],(2007-10-22)[2013-03-27]. http://www.keenage.com/html/c_bulletin_2007.htm.
    [108] Ku L W, Chen H H. Mining opinions from the Web: Beyond relevance retrieval. J. Am. Soc.Inf. Sci. Technol.,2007,58(12):1838–1850.
    [109] Chang C C, Lin C J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst.Technol.,2011,2(3):27:1–27:27.
    [110] Kahle B. Preserving the Internet. Scientific American,1997,276(3):82–83.
    [111] Yan H, Huang L, Chen C, et al. A new data storage and service model of China web InfoMall.Proceedings of the4th International Web Archiving Workshop (IWAW2004) of8th EuropeanConference on Research and Advanced Technologies for Digital Libraries, Bath, UK,2004.
    [112] Hallgrímsson T. The International Internet Preservation Consortium (IIPC). Proceedings ofConference of Directors of National Libraries (CDNL2005), Oslo, Norway,2005.14–18.
    [113] Ball A. Web Archiving. Technical report, Digital Curation Centre, UKOLN, University ofBath, March,2010.
    [114] Hodge G. An information life-cycle approach: Best practices for digital archiving. Journal ofElectronic Publishing,2000,5(4).
    [115] Ronald Jantz M, MLIS M. Digital archiving and preservation: Technologies and processes fora trusted repository. Journal of Archival Organization,2007,4(1-2):193–213.
    [116] Seadle M. Selection for digital preservation. Library hi tech,2004,22(2):119–121.
    [117] Albertsen K. The Paradigma Web Harvesting Environment. Proceedings of the3rd Workshopon Web Archives,2003.49–62.
    [118] JaJa J, Song S. Robust Tools and Services for Long-Term Preservation of Digital Information.Library Trends,2009,57(3).
    [119] Nelson M, McCown F, Smith J, et al. Using the web infrastructure to preserve web pages.International Journal on Digital Libraries,2007,6(4):327–349.
    [120] Thomas A, Meyer E T, Dougherty M, et al. Researcher engagement with Web archives: Chal-lenges and opportunities for investment. Technical report, Joint Information Systems Commit-tee Report, Aug,2010.
    [121] Intel. What Happens In An Internet Minute?[EB/OL],(2012-03)[2013-03-29]. http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html.
    [122] Richardson L, Ruby S. RESTful web services. Sebastopol, California, USA: O’Reilly Media,2008.
    [123] Twitter Blog. One Million Registered Twitter Apps,(2011-07-11)[2013-03-29]. http://blog.twitter.com/2011/07/one-million-registered-twitter-apps.html.
    [124] Liu Z, Chen X, Sun M. Mining the interests of Chinese microbloggers via keyword extraction.Frontiers of Computer Science,2012,6(1):76–87.
    [125]清华大学自然语言处理与社会人文计算实验室.我组研发的新浪微博应用“围脖关键词”授权用户数突破百万[EB/OL],(2011-08-23)[2013-03-29]. http://nlp.csai.tsinghua.edu.cn/site2/index.php?option=com_content&view=article&id=171%3A2011-08-23-15-57-58&catid=1%3Alatest-news&Itemid=50&lang=zh.
    [126] Tsinghua-NUS NExT Search Centre. StrmWrd: Seek Things that aRe coMmon While Reserv-ing Differences [CP/OL],(2012-08)[2013-04-04]. https://github.com/THUNUS/StrmWrd.
    [127] Cui A, Yang L, Hou D, et al. PrEV: preservation explorer and vault for web2.0user-generatedcontent. Proceedings of Theory and Practice of Digital Libraries. Berlin Heidelberg: Springer,2012:101–112.
    [128] Xu G, Meng X, Wang H. Build Chinese emotion lexicons using a graph-based algorithm andmultiple resources. Proceedings of the23rd International Conference on Computational Lin-guistics, Stroudsburg, PA, USA: Association for Computational Linguistics,2010.1209–1217.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700