Real-Time Data Analytics: An Algorithmic Perspective
详细信息    查看全文
  • 关键词:Big data ; Real ; time data analytics ; Machine learning algorithms ; Large ; scale stream data processing
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9714
  • 期:1
  • 页码:311-320
  • 全文大小:362 KB
  • 参考文献:1.Cha, S., Monica, W.: Developing a real-time data analytics framework using Hadoop. In: IEEE International Congress on BigData. IEEE (2015)
    2.Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2, 8 (2015)CrossRef
    3.Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., Ganguli, D.: Druid: a real-time analytical data store. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, June 2014, pp. 157–168 (2014)
    4.Morshed, S.J., Rana, J., Milrad, M.: Open source initiatives and frameworks addressing distributed real-time data analytics. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, Illinois, Chicago, USA (2016). doi:10.​1109/​IPDPSW.​2016.​152
    5.Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On clustering massive data streams: a summarization paradigm. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms, vol. 31, pp. 9–38. Springer, New York (2007)CrossRef
    6.Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, pp. 419–429 (1994)
    7.Keim, D.A., Krstajic, M., Rohrdantz, C., Schreck, T.: Real-time visual analytics for text streams. Computer 46(7), 47–55 (2013)CrossRef
    8.Tripathy, B.K., Manusha, G.V., Mohisin, G.S.: An improved set-valued data anonymization algorithm and generation of FP-Tree. In: Venugopal, K.R., Patnaik, L.M. (eds.) ICIP 2012. CCIS, vol. 292, pp. 552–560. Springer, Heidelberg (2012)CrossRef
    9.Xie, J., Yang, J.: A survey of join processing in data streams. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms. Advances in Database Systems, vol. 31, pp. 209–236. Springer, New York (2007). ISBN: 10:0-387-28759-0CrossRef
    10.Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM PODS Conference (2002)
    11.Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: ACM SIGMOD Conference (1998)
    12.Aggarwal, C., Procopiuc, C., Wolf, J., Yu, P., Park, J.-S.: Fast algorithms for projected clustering. In: ACM SIGMOD Conference (1999)
    13.Aggarwal, C.C.: A survey of change diagnosis algorithms in evolving data streams. In: Models and Algorithms, pp. 85–102. IBM (2007)
    14.Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms, vol. 31, pp. 39–59. Springer, New York (2007)CrossRef
    15.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)CrossRef MATH
    16.Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Conference (1993)
    17.Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Proceedings of the NSF Workshop on Next Generation Data Mining (2002)
    18.Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)
    19.Xifeng, Y., Han, J.: GSPAN: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), p. 721 (2002)
    20.Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 71–80 (2002)
    21.Fiat, A., Woeginger, G.J.: Online Algorithms: The State of the Art. LNCS, vol. 1442. Springer, Heidelberg (1998)MATH
    22.Shalev-Shwartz, S.: Online Learning: Theory, Algorithms, and Applications, The Hebrew University of Jerusalem. Ph.D. thesis (2014)
    23.Littlestone, N., Warmuth, M.: Relating data compression and learn ability. Unpublished Manuscript, November 1986
    24.Ikonomovska, E., Mariano, Z.: Algorithmic techniques for processing data streams. In: Data Exchange, Information, and Streams, pp. 237–274 (2013)
    25.Krauth, W., Mezard, M.: Learning algorithms with optimal stability in neural networks. J. Phys. A 20, 745 (1987)MathSciNet CrossRef
    26.Ben-David, S., Kushilevitz, E., Mansour, Y.: Online learning versus offline learning. Mach. Learn. 29, 45–63 (1997). Kluwer Academic Publishers, NetherlandsCrossRef MATH
    27.Shoorehdeli, M.A., Teshnehlab, M., Sedigh, A.K.: Novel hybrid learning algorithms for tuning ANFIS parameters using adaptive weighted PSO. In: IEEE International on Fuzzy Systems Conference, FUZZ-IEEE 2007, London, pp. 1–6 (2007)
    28.Nguyen, T., Schiefer, J., Tjoa, M.A.: Sense & response service architecture (SARESA): an approach towards a real-time business intelligence solution and its use for a fraud detection application. In: Proceedings of DOLAP 2005, pp. 77–86. ACM, New York (2005)
    29.Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of KDD 2002, pp. 279–288. ACM, New York (2002)
    30.Shamir, O.: Fundamental limits of online and distributed algorithms for statistical learning and estimation (2013). CoRR: abs/1311.3494
  • 作者单位:Sarwar Jahan Morshed (15)
    Juwel Rana (15) (16)
    Marcelo Milrad (15)

    15. Department of Media Technology, Linnaeus University, Växjö, Sweden
    16. Telenor Group, Oslo, Norway
  • 丛书名:Data Mining and Big Data
  • ISBN:978-3-319-40973-3
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9714
文摘
Massive amount of data sets are continuously generated from a wide variety of digital services and infrastructures. Examples of those are machine/system logs, retail transaction logs, traffic tracing data and diverse social data coming from different social networks and mobile interactions. Currently, the New York stock exchange produces 1 TB data per day, Google processes 700 PB of data per month and Facebook hosts 10 billion photos taking 1 PB of storage just to mention some cases. Turning these streaming data flow into actionable real-time insights is not a trivial task. The usage of data in real-time can change different aspects of the business logic of any corporation including real time decision making, resource optimization, and so on. In this paper, we present an analysis of different aspects related to real-time data analytics from an algorithmic perspective. Thus, one of the goals of this paper is to identify some new problems in this domain and to gain new insights in order to share the outcomes of our efforts and these challenges with the research community working on real-time data analytics algorithms.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700