Real-Time Data Analytics: An Algorithmic Perspective

详细信息查看全文

关键词：Big data ; Real ; time data analytics ; Machine learning algorithms ; Large ; scale stream data processing
刊名：Lecture Notes in Computer Science
出版年：2016
出版时间：2016
年：2016
卷：9714
期：1
页码：311-320
全文大小：362 KB
参考文献：1.Cha, S., Monica, W.: Developing a real-time data analytics framework using Hadoop. In: IEEE International Congress on BigData. IEEE (2015)
2.Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2, 8 (2015)CrossRef
3.Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., Ganguli, D.: Druid: a real-time analytical data store. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, June 2014, pp. 157–168 (2014)
4.Morshed, S.J., Rana, J., Milrad, M.: Open source initiatives and frameworks addressing distributed real-time data analytics. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, Illinois, Chicago, USA (2016). doi:10.1109/IPDPSW.2016.152
5.Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On clustering massive data streams: a summarization paradigm. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms, vol. 31, pp. 9–38. Springer, New York (2007)CrossRef
6.Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, pp. 419–429 (1994)
7.Keim, D.A., Krstajic, M., Rohrdantz, C., Schreck, T.: Real-time visual analytics for text streams. Computer 46(7), 47–55 (2013)CrossRef
8.Tripathy, B.K., Manusha, G.V., Mohisin, G.S.: An improved set-valued data anonymization algorithm and generation of FP-Tree. In: Venugopal, K.R., Patnaik, L.M. (eds.) ICIP 2012. CCIS, vol. 292, pp. 552–560. Springer, Heidelberg (2012)CrossRef
9.Xie, J., Yang, J.: A survey of join processing in data streams. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms. Advances in Database Systems, vol. 31, pp. 209–236. Springer, New York (2007). ISBN: 10:0-387-28759-0CrossRef
10.Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM PODS Conference (2002)
11.Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: ACM SIGMOD Conference (1998)
12.Aggarwal, C., Procopiuc, C., Wolf, J., Yu, P., Park, J.-S.: Fast algorithms for projected clustering. In: ACM SIGMOD Conference (1999)
13.Aggarwal, C.C.: A survey of change diagnosis algorithms in evolving data streams. In: Models and Algorithms, pp. 85–102. IBM (2007)
14.Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Aggarwal, C.C. (ed.) Data Streams: Models and Algorithms, vol. 31, pp. 39–59. Springer, New York (2007)CrossRef
15.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)CrossRef MATH
16.Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Conference (1993)
17.Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Proceedings of the NSF Workshop on Next Generation Data Mining (2002)
18.Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)
19.Xifeng, Y., Han, J.: GSPAN: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), p. 721 (2002)
20.Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 71–80 (2002)
21.Fiat, A., Woeginger, G.J.: Online Algorithms: The State of the Art. LNCS, vol. 1442. Springer, Heidelberg (1998)MATH
22.Shalev-Shwartz, S.: Online Learning: Theory, Algorithms, and Applications, The Hebrew University of Jerusalem. Ph.D. thesis (2014)
23.Littlestone, N., Warmuth, M.: Relating data compression and learn ability. Unpublished Manuscript, November 1986
24.Ikonomovska, E., Mariano, Z.: Algorithmic techniques for processing data streams. In: Data Exchange, Information, and Streams, pp. 237–274 (2013)
25.Krauth, W., Mezard, M.: Learning algorithms with optimal stability in neural networks. J. Phys. A 20, 745 (1987)MathSciNet CrossRef
26.Ben-David, S., Kushilevitz, E., Mansour, Y.: Online learning versus offline learning. Mach. Learn. 29, 45–63 (1997). Kluwer Academic Publishers, NetherlandsCrossRef MATH
27.Shoorehdeli, M.A., Teshnehlab, M., Sedigh, A.K.: Novel hybrid learning algorithms for tuning ANFIS parameters using adaptive weighted PSO. In: IEEE International on Fuzzy Systems Conference, FUZZ-IEEE 2007, London, pp. 1–6 (2007)
28.Nguyen, T., Schiefer, J., Tjoa, M.A.: Sense & response service architecture (SARESA): an approach towards a real-time business intelligence solution and its use for a fraud detection application. In: Proceedings of DOLAP 2005, pp. 77–86. ACM, New York (2005)
29.Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of KDD 2002, pp. 279–288. ACM, New York (2002)
30.Shamir, O.: Fundamental limits of online and distributed algorithms for statistical learning and estimation (2013). CoRR: abs/1311.3494
作者单位：Sarwar Jahan Morshed (15)
Juwel Rana (15) (16)
Marcelo Milrad (15)

15. Department of Media Technology, Linnaeus University, Växjö, Sweden
16. Telenor Group, Oslo, Norway
丛书名：Data Mining and Big Data
ISBN：978-3-319-40973-3
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349
卷排序：9714

文摘

Massive amount of data sets are continuously generated from a wide variety of digital services and infrastructures. Examples of those are machine/system logs, retail transaction logs, traffic tracing data and diverse social data coming from different social networks and mobile interactions. Currently, the New York stock exchange produces 1 TB data per day, Google processes 700 PB of data per month and Facebook hosts 10 billion photos taking 1 PB of storage just to mention some cases. Turning these streaming data flow into actionable real-time insights is not a trivial task. The usage of data in real-time can change different aspects of the business logic of any corporation including real time decision making, resource optimization, and so on. In this paper, we present an analysis of different aspects related to real-time data analytics from an algorithmic perspective. Thus, one of the goals of this paper is to identify some new problems in this domain and to gain new insights in order to share the outcomes of our efforts and these challenges with the research community working on real-time data analytics algorithms.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700