Efficient temporal mining of micro-blog texts and its application to event discovery
详细信息    查看全文
  • 作者:Giovanni Stilo ; Paola Velardi
  • 关键词:Event detection ; Temporal mining ; Symbolic Aggregate ApproXimation ; Microblog analysis
  • 刊名:Data Mining and Knowledge Discovery
  • 出版年:2016
  • 出版时间:March 2016
  • 年:2016
  • 卷:30
  • 期:2
  • 页码:372-402
  • 全文大小:1,619 KB
  • 参考文献:Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J. Mach. Learn. Res. 3:993–1022MATH
    Chae J, Thom D, Bosch H, Jang Y, Maciejewski R, Ebert D, Ertl T (2013) Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition. IEEE symposium on visual analytics science and technology, Seattle
    Cha M, Haddadi H, Benvenuto F, Gummadi K (2010) Measuring user influence in twitter: the million followers fallacy. In: Proceedings of conference on artificial intelligence AAAI
    Cheng T, Wicks T (2014) Event detection using Twitter: a spatio-temporal approach. PLoS One 9(6):e97807. doi:10.​1371/​journal.​pone.​0097807 CrossRef
    Dao Q, Jiang J, Zhu F, Lim WP (2012) Finding bursty topics from microblogs. In: Proceedings of conference association of computational linguistics ACL 2012
    Dou W, Wang X, Ribarsky W, Zhou M (2012) Event detection in social media data. In: IEEE VisWeek workshop on interactive visual text analytics. Seattle, WA
    Dredze M (2012) How social media will change public health. IEEE Intell Syst 27(4):81–84. doi:10.​1109/​MIS.​2012.​76 CrossRef
    Hong L, Davison B (2010) Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics, pp. 80–88. ACM
    Hong L, Dom B, Gurumurthy S, Tsioutsioulikis K (2011) Time-dependent topic model for multiple text streams. In: ACM conference on knowledge discovery and data mining KDD 2011, San Diego
    Huang B, Yang Y, Mahmood A, Wang H (2012) Microblog topic detection based on LDA model and single-pass clustering RSCTC 2012, LNAI 7413, pp. 166–171
    Ifrim G, Shi B, Brigadir I (2014) Event detection in Twitter using aggressive filtering and hierarchical tweet clustering proceedings of SNOW-WWW workshop, Korea
    Jain A (2010) Data clustering: 50 years beyond K-means. Patt Recogn Lett 31:651–666CrossRef
    Keogh E, Chakrabarti K, Pazzani M (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings Of ACM special interest group on management of data SIGMOD, pp. 151–162
    Kovacs F, Legany C, Babos A (2005) Cluster validity measurement techniques. In: Proceedings of 6th international symposium of Hungarian researchers on computational intelligence, Budapest
    Lee R, Sumiya K (2010) Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection. Proceedings of the 2nd ACM international workshop on location based social networks SIGSPATIAL, LBSN ’10. ACM, New York, pp. 1–10
    Lehmann J, Goncalves B, Ramasco JJ, Cattuto C (2012) Dynamical classes of collective attention in Twitter. Proceedings of World Wide Web Conference WWW2012
    Lin J, Keogh E, Li W, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Mining Knowl Discov 15(2):107–144CrossRef
    Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39:287–315CrossRef
    Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: Proceedings of ACM international conference on information and knowledge management CIKM
    Maynard D, Funk A (2012) Challenges in developing opinion mining tools for social media. In: Proceedings Of @NLP cann u tag #usergenartedcontent? Workshop at LREC 2012, Istanbul
    McMinn A, Moshfeghi Y, Jose JM (2013) Building a large scale corpus for evaluating event detection in twitter, ACM international conference on information and knowledge management CIKM’13, San Francisco
    Mei Q, Zhai C (2005) Discovering evolutionary theme patterns from text—an exploration of temporal text mining. In: Proceedings of conference of knowledge discovery and data mining KDD’05, Chigago
    Oncina J, Garcıa P (1992) Inferring regular languages in polynomial updated time. In: 4th Spanish symposium on pattern recognition and image analysis, MPAI. vol. 1. World Scientific, pp. 49–61
    Petrovic S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: Proceedings of national American conference of the association of computational linguistics NAACL
    Petrovic S, Osborne M, Mc Creadie R (2013) Can Twitter replace Newswire for breaking news?. In: Proceedings of the 7th international AAAI conference on weblogs and social media, ICWSM
    Pohl D, Bouchachia A, Hellwagner H (2012) Automatic sub-event detection in Emergency management using social media (2012), WWW2012-SWDM’12 Workshop, Lyon
    Popescu AM, Pennacchiotti M, Paranjpe D (2011) Extracting events and event descriptions from twitter. In: Worls Wide Web Conference WWW2011, pp. 105–106, 2011
    Rui L, Kin L, Ravi K, Kevin C (2012) TEDAS: a Twitter-based event detection and analysis system. In: IEEE 28th international conference on data engineering (ICDE), pp. 1273–1276
    Wang X, Zhu F, Jing J, Li S (2013) Real time event detection in Twitter, conference on web age information management WAIM, Spinger
    Weng J, Lim E, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on Web Search and data mining WSDM, ACM, pp. 261–270
    Weng J, Yao Y, Leonardi E, Lee B (2011) Event detection in Twitter. In: International AAAI conference on weblogs and social media ICWSM
    Xie W, Zhu F, Jang J, Lim E, Wang K (2013) TopicSketch: real-time bursty topic detection from Twitter, IEEE 13th international conference on data mining (ICDM)
    Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In Proceedings of the fourth ACM international conference on web search and data mining (WSDM), pp. 177–186
    Yan X, Guo J, Lan Y, Cheng X (2013) A biterm topic model for short texts. In: World Wide Web conference WWW 2013, Rio de Janeiro
  • 作者单位:Giovanni Stilo (1)
    Paola Velardi (1)

    1. Department of Computer Science, Sapienza University of Roma, Via Salaria 113, Rome, Italy
  • 刊物类别:Computer Science
  • 刊物主题:Data Mining and Knowledge Discovery
    Computing Methodologies
    Artificial Intelligence and Robotics
    Statistics
    Statistics for Engineering, Physics, Computer Science, Chemistry and Geosciences
    Information Storage and Retrieval
  • 出版者:Springer Netherlands
  • ISSN:1573-756X
文摘
In this paper we present a novel method for clustering words in micro-blogs, based on the similarity of the related temporal series. Our technique, named SAX*, uses the Symbolic Aggregate ApproXimation algorithm to discretize the temporal series of terms into a small set of levels, leading to a string for each. We then define a subset of “interesting” strings, i.e. those representing patterns of collective attention. Sliding temporal windows are used to detect co-occurring clusters of tokens with the same or similar string. To assess the performance of the method we first tune the model parameters on a 2-month 1 % Twitter stream, during which a number of world-wide events of differing type and duration (sports, politics, disasters, health, and celebrities) occurred. Then, we evaluate the quality of all discovered events in a 1-year stream, “googling” with the most frequent cluster n-grams and manually assessing how many clusters correspond to published news in the same temporal slot. Finally, we perform a complexity evaluation and we compare SAX* with three alternative methods for event discovery. Our evaluation shows that SAX* is at least one order of magnitude less complex than other temporal and non-temporal approaches to micro-blog clustering. Keywords Event detection Temporal mining Symbolic Aggregate ApproXimation Microblog analysis

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700