A data transmission algorithm for distributed computing system based on maximum flow
详细信息    查看全文
  • 作者:Xiaolu Zhang ; Jiafu Jiang ; Xiaotong Zhang ; Xuan Wang
  • 关键词:Distributed computing system ; Minimize bandwidth usage ; Data transmission time
  • 刊名:Cluster Computing
  • 出版年:2015
  • 出版时间:September 2015
  • 年:2015
  • 卷:18
  • 期:3
  • 页码:1157-1169
  • 全文大小:1,344 KB
  • 参考文献:1.Apache hadoop. http://鈥媤ww.鈥媓adoop.鈥媋pache.鈥媜rg
    2.Ahuja, R.K., Goldberg, A.V., Orlin, J.B., Tarjan, R.E.: Finding minimum-cost flows by double scaling. Math. Program. 53(1鈥?), 243鈥?66 (1992)MATH MathSciNet CrossRef
    3.Buyya, R.: Parmon: a portable and scalable monitoring system for clusters. Software 30(7), 723鈥?40 (2000)MATH
    4.Cherkassky, B.V., Goldberg, A.V.: On implementing the pushrelabel method for the maximum flow problem. Algorithmica 19(4), 390鈥?10 (1997)MATH MathSciNet CrossRef
    5.Christiano, P., Kelner, J.A., Madry, A., Spielman, D.A., Teng, S.H.: Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs. In: Proceedings of the 43rd Annual ACM Symposium on Theory of Computing, pp. 273鈥?82. ACM Press, San Jose (2011)
    6.Cidon, A., Rumble, S., Stutsman, R., Katti, S., Ousterhout, J., Rosenblum, M.: Copysets: reducing the frequency of data loss in cloud storage. In: Presented as part of the 2013 USENIX Annual Technical Conference, pp. 37鈥?8. USENIX (2013)
    7.Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.L., et al.: Introduction to Algorithms. MIT Press, Cambridge (2001)MATH
    8.Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72鈥?7 (2010). doi:10.鈥?145/鈥?629175.鈥?629198 CrossRef
    9.Dinic, E.: Algorithm for solution of a problem of maximum flow in a network with power estimation. Soviet Math. Doll. 11(5), 1277鈥?280, (1970) (English translation by RF. Rinehart)
    10.Edmonds, J., Karp, R.M.: Theoretical improvements in algorithmic efficiency for network flow problems. J. ACM (JACM) 19(2), 248鈥?64 (1972)MATH CrossRef
    11.Ford, D., Fulkerson, D.R.: Flows in Networks. Princeton University Press, Princeton (2010)
    12.Ford, L.R., Fulkerson, D.R.: Maximal flow through a network. Can. J. Math. 8(3), 399鈥?04 (1956)MATH MathSciNet CrossRef
    13.Goldberg, A.V., Rao, S.: Beyond the flow decomposition barrier. J. ACM (JACM) 45(5), 783鈥?97 (1998)MATH MathSciNet CrossRef
    14.Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handling Data Skew in Mapreduce, pp. 574鈥?83. Eindhoven University of Technology, Noordwijkerhout (2011)
    15.Helal, A.S., Yuan, D., Hesham, E.R.: Dynamic data reallocation for skew management in shared-nothing parallel databases. Distrib. Parallel Databases 5(3), 271鈥?88 (1997)
    16.Hsiao, H.C., Chung, H.Y., Shen, H., Chao, Y.C.: Load rebalancing for distributed file systems in clouds. IEEE Trans. Parallel Distrib. Syst. 24(5), 951鈥?62 (2013). doi:10.鈥?109/鈥婽PDS.鈥?012.鈥?96 CrossRef
    17.Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., Wu, S.: Handling partitioning skew in mapreduce using leen. Peer-to-Peer Netw. Appl. 6(4), 409鈥?24 (2013)CrossRef
    18.Imamagic, E., Dobrenic, D.: Grid infrastructure monitoring system based on nagios. In: Proceedings of the 2007 Workshop on Grid Monitoring, pp. 23鈥?8. ACM Press, New York (2007)
    19.Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev. 41(3), 59鈥?2 (2007). doi:10.鈥?145/鈥?272998.鈥?273005 CrossRef
    20.Jin, J., Luo, J., Song, A., Dong, F., Xiong, R.: Bar: an efficient data locality driven task scheduling algorithm for cloud computing. In: Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, pp. 295鈥?04. IEEE Press, New York (2011)
    21.Kliazovich, D., Bouvry, P., Khan, S.U.: Dens: data center energy-efficient network-aware scheduling. Clust. Comput. 16(1), 65鈥?5 (2013)CrossRef
    22.Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 25鈥?6. ACM Press, New York (2012)
    23.Li, M., Subhraveti, D., Butt, A.R., Khasymski, A., Sarkar, P.: Cam: a topology aware minimum cost flow based resource manager for mapreduce applications in the cloud. In: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, pp. 211鈥?22. ACM Press, Hoboken (2012)
    24.Lu, H., Yu, J.X., Feng, L., Li, Z.: Fully dynamic partitioning: handling data skew in parallel data cube computation. Distrib. Parallel Databases 13(2), 181鈥?02 (2003)MATH CrossRef
    25.M盲rtens, H.: A classification of skew effects in parallel database systems. In: Euro-Par 2001 Parallel Processing, pp. 291鈥?00. Springer, New York (2001)
    26.Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30(7), 817鈥?40 (2004)CrossRef
    27.Run-liu, W., Yun-hui, Y.: Low cost network coding algorithm for data distribution network. In: Proceedings of 8th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM), pp. 1鈥? (2012). doi:10.鈥?109/鈥媁iCOM.鈥?012.鈥?478566
    28.Schrijver, A.: On the history of combinatorial optimization (till 1960). Handbook of Discrete Optimization pp. 1鈥?8 (2005)
    29.Slagter, K., Hsu, C.H., Chung, Y.C., Yi, G.: Smartjoin: a network-aware multiway join for mapreduce. Clust. Comput. 17, 1鈥?3 (2014)CrossRef
    30.Spielman, D.A., Teng, S.H.: Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the 36h Annual ACM Symposium on Theory of Computing, pp. 81鈥?0. ACM Press, New York (2004)
    31.Vygen, J.: On dual minimum cost flow algorithms. Math. Methods Oper. Res. 56(1), 101鈥?26 (2002)MATH MathSciNet CrossRef
    32.Yook, J., Tilbury, D.: Performance evaluation of distributed control systems with reduced communication. Ann Arbor 1001, 48,109 (2001)
    33.Yook, J.K., Tilbury, D.M., Soparkar, N.R.: Trading computation for bandwidth: reducing communication in distributed control systems using state estimators. IEEE Trans. Control Syst. Technol. 10(4), 503鈥?18 (2002)CrossRef
  • 作者单位:Xiaolu Zhang (1)
    Jiafu Jiang (1)
    Xiaotong Zhang (1)
    Xuan Wang (1)

    1. University of Science and Technology Beijing, Beijing, China
  • 刊物类别:Computer Science
  • 刊物主题:Processor Architectures
    Operating Systems
    Computer Communication Networks
  • 出版者:Springer Netherlands
  • ISSN:1573-7543
文摘
Data skew can lead to load imbalance and longer computation time in the distributed computing system. To avoid data skew and reduce the data computation time, it is necessary to transmit the data to appropriate machines, this may however take too much network resources. How to balance the computational resources and the network resources is a problem. In this paper, we introduce a computation model called distributed two-phase model, in which the process of a task can be divided into two independent phases: data transmission and data computation. Suppose an upper bound of relative computation time is given, we show how to schedule data transmission with minimum resources, such as data transmission time and occupied bandwidth, to meet the demand. In this paper, we present a novel algorithm to minimize data transmission time and network bandwidth usage in the data transmission phase, with the conditions that an upper bound of relative computation time of data computation phase is given. Moreover, the number of nodes that participate in data computation phase is also reduced, in this way, the computational resources are saved. The simulation results show that the occupied bandwidth can be reduced effectively (about 70 %) in the situation of large-scale data sets and large number of nodes. Our algorithm is also shown to be available in replication situation. Keywords Distributed computing system Minimize bandwidth usage Data transmission time
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.