BioLab面向生物计算服务的网格系统
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
生物信息学已成为能够改变科学发展的决定性力量之一,网格计算为生物信息应用提供了强大的计算和存储平台。然而,网格应用首先需要解决服务集成、资源异构、作业管理和调度等问题,同时向生物学家屏蔽上述复杂的底层实现。一个可行的方法是建立一个Web门户作为中间媒介。
     本文介绍了一个在网格环境下的生物门户站点BioLab项目,系统定义了服务的提供者、部署者和使用者三种用户角色;设计了网格环境下的服务和资源整合机制,实现了用户管理以及作业调度控制等功能;同时,系统提供了标准Web Service的接口,增加扩展性。
     我们分析了启发式调度算法,选择在线调度而非批处理调度以提高系统响应速度。实验表明,在较大的作业平均到达时间条件下,非阻塞的调度方式的作业总完成时间比阻塞的调度方式小;当作业平均到达时间增加时,作业总运行时间也会增加,但速度会变缓;在较小的作业平均到达时间条件下,各种在线调度启发式算法差别不大,而在较大的作业平均到达时间条件下,随机性好的随机调度算法和最大可用内存容量优先调度算法有更小的作业总完成时间。
Grid provides powerful computing and storage platform for Bioinformatics, which has become one of the leading forces changing the way in which science is now conducted. However, grid appliances first need to solve problems such as service integration, heterogeneous resource, job management and scheduling while masking complex implements, where a web portal becomes a feasible solution.
     We introduce BioLab, a grid bioinformatics portal. We define three roles in the system: service providers, service distributers and service users, design integrated mechanism of services and resources and implement modules such as user management and job scheduling and job management. We also provide a non GT Web Service API.
     In the experiments, we analyze heuristic scheduling algorithms and choose online scheduling algorithms rather than batch scheduling for response speed. We find that in large average job arrival time, unblocked scheduling is better than blocked scheduling, as the average job arrival time increases, Makespan of workflow also increases but the trend becomes slower, in small average job arrival time, all online scheduling algorithms are nearly the same while in large average job arrival time, random algorithm and max available memory scheduling algorithm greatly outperforms the other.
引文
1 Dtabb. Bioinformatics. 2008 22:41, 28 May 2008 http://en.wikipedia.org/wiki/Bioinformatics.
    2 I. Foster C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann. 1999, San Fransisco. 2~20 2~20.
    3 Casanova Henri. Distributed Computing Research Issues in Grid Computing. ACM SIGACT News, 2002: 4~7.
    4 Foster I. The Grid: A New Infrastructure for 21st Century Science. Physics Today, 2002 2~30.
    5 Foster I Kesselman C, Nick C, Tuecke S. The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Open Grid Services Infrastructure WG, Global Grid Forum, 2002: 1~7.
    6 Ji Zhu Ang Guo, Zhonghua Lu, Yongwei Wu, Bin Shen, Xuebin Chi. Analysis of the Bioinformatics Grid Technique Applications in China. Cluster Computing and the Grid Workshops, 2006. Sixth IEEE International Symposium, 2006 1~2.
    7黄元南.生物网格的应用探索.软件世界, 2005: 56~57.
    8 Karnesky Elf. EMBOSS. 2006 http://en.wikipedia.org/wiki/EMBOSS.
    9 Rice P. Longden I., Bleasby. EMBOSS: The European Molecular Biology Open Software Suite (2000) Trends in Genetics 16, (6) pp276--277 2000 http://emboss.sourceforge.net/what/.
    10吕军张颖,冯立芹,李宏.生物信息学工具BLAST的使用简介.内蒙古大学学报(自然科学版), 2003.
    11肖国荣.一个生物计算网格原型的设计和实现.国防科技大学硕士论文, 2005: 11~13.
    12 Guoshi Xu Yin Luo, Huashan Yu, and Zhuoqun Xu. An Approach to SOA-based Bioinformatics Grid. Services Computing, 2006. APSCC '06. IEEE Asia-Pacific Conference, 2006: 1~2.
    13 Aloisio G. Cafaro M., Fiore S., Mirto M. ProGenGrid: a workflow service infrastructure for composing and executing bioinformatics grid services. Computer-Based Medical Systems, 2005. Proceedings. 18th IEEE Symposium, 2005 1~4.
    14 Xiujun Gong; Nakamura K.; Hua Yu; Kei Yura; Nobuhiro Go;. BAAQ: An Infrastructure forApplication Integration and Knowledge Discovery in Bioinformatics. Information Technology in Biomedicine, IEEE Transactions, 2007 .
    15 Moreau L. Miles, S., Goble, C., Greenwood, M., Dialani, V., Addis M., Alpdemir N., Cawley R., De Roure D. Ferris J., Gaizauskas R., Glover K., Greenhalgh C., Li P., Xiaojian Liu, Lord P., Luck M., Marvin D., Oinn T., Paton N., Pettifer S., Radenkovic M.V., Roberts A., Robinson A., Rodden T., Senger M., Sharman, N., Stevens R., Warboys B., Wipat A., Wroe C. On the use of agents in a BioInformatics grid. Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003. 3rd IEEE/ACM International Symposium, 2003: 3~8.
    16 T. D. Braun H. J. Siegel, N. Beck, L. Boloni,M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao. A taxonomy for describing matching and scheduling heuristics for mixed-machine heterogeneous computing systems. IEEE Workshop on Advances in Parallel and Distributed Systems, 1998: 330~335.
    17 Ghafoor A., Yang, J. A distributed heterogeneous supercomputing management system. Computer, 1993 26 78 - 86.
    18 Hensgen D.A. Kidd T., St. John D., Schnaidt M.C., Siegel H.J., Braun T.D., Maheswaran M., Ali S., Jong-Kook Kim, Irvine C., Levin T., Freund R.F., Kussow M., Godfrey M., Duman A., Carff P., Kidd S., Prasanna V., Bhat P., Alhusaini A. An overview of MSHN: the Management System for Heterogeneous Networks. Heterogeneous Computing Workshop, 1999 184 ~ 198.
    19 PJong-Kook Kim Sameer Shivle. Dynamically mapping tasks with priorities and multiple deadlines in a heterogeneous environment Journal of Parallel and Distributed Computing, 2007. 67(2).
    20 Jong-Kook Kim Shivle S., Siegel H.J., Maciejewski A.A., Braun T.D., Schneider M., Tideman S., Chitta R., Dilmaghani R.B., Joshi R., Kaul A., Sharma A., Sripada S., Vangari P., Yellampalli S.S. Dynamic mapping in a heterogeneous environment with tasks having priorities and multiple deadlines. Parallel and Distributed Processing Symposium, 2003 1~15.
    21 Attiya G. Hamam Y. Reliability oriented task allocation in heterogeneous distributed computing systems. Computers and Communications, 2004 168 ~ 73.
    22 Dogan A. Ozguner F. Optimal and suboptimal reliable scheduling of precedence-constrained tasks in heterogeneous distributed computing. Parallel Processing, 2000: 429 ~ 436.
    23 Joohan Lee Chapin S.J., Taylor S. Reliable heterogeneous applications. Reliability, 2003 330 ~ 339 .
    24 Maheswaran Muthucumaru. Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems. Heterogeneous Computing Workshop, 1999.(HCW '99) Proceedings. , 1999: 1~15.
    25 Oscar H Ibarra, Chul E. Kim. Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors. Journal of the ACM (JACM), 1977: 1~4.
    26 Iverson M. Ozguner F. Dynamic, competitive scheduling of multiple DAGs in a distributed heterogeneous environment. Heterogeneous Computing Workshop, 1998: 70 ~ 78 .
    27 Dogan A. Ozguner F. Stochastic scheduling of a meta-task in heterogeneous distributed computing. Parallel Processing Workshops, 2001 369~374 .
    28 Daoud M.I. Kharma N. An Efficient Genetic Algorithm for Task Scheduling in Heterogeneous Distributed Computing Systems. Evolutionary Computation, 2006 1~15.
    29 Henri Casanov Arnaud Legrandy, Dmitrii Zagorodnov, F rancine Berman. Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. Heterogeneous Computing Workshop, 2000: 4~5.
    30 A Fiat GJ Woeginger. Online algorithms: The state of the art. 1998 Berlin; New York: Springer.
    31 Hyun Joong Yoon Doo Yong Lee. Online Scheduling of Integrated Single-Wafer Processing Tools With Temporal Constraints. Semiconductor Manufacturing, 2005: 390 ~ 398.
    32 Baruah S. Koren G., Mao D., Mishra B., Raghunathan A., Rosier L., Shasha D., Wang F.,. On the competitiveness of on-line real-time task scheduling. Real-Time Systems Symposium, 1991 106 ~ 115 .
    33 Steiger C. Walder H., Platzner M., Thiele L. Online scheduling and placement of real-time tasks to partially reconfigurable devices. Real-Time Systems Symposium, 2003: 224 ~ 225 .
    34 Nabar S.U. Kumar N., Bayati M., Keshavarzian A. Achieving stability in networks of input-queued switches using a local online scheduling policy. Global Telecommunications Conference, 2005: 1~5.
    35 Feinberg E.A. Curry M.T. Online scheduling: generalized pinwheel problem. Decision and Control, 2003 4333 ~ 4338.
    36 Bohnenberger T. Fischer K., Gerber C. Agents in manufacturing: online scheduling and production plant configuration. Agent Systems and Applications, 1999 66~73.
    37 McGarry M.P. Reisslein M., Colbourn C.J., Maier M. Just-in-Time Online Scheduling for WDM EPONs. Communications, 2007: 2174~2179.
    38胡玲玲杨寿保,张然美,申凯.网格中效用驱动的多维QoS在线调度机制.华中科技大学学报, 2007.第35卷57~58.
    39 Marchal Loris. Optimal Bandwidth Sharing in Grid Environments. IEEE High PerformanceDistributed Computing (HPDC), 2006 144~146.
    40 Jones William M. Characterization of Bandwidth-aware Meta-schedulers for Co-allocating Jobs Across Multiple Clusters. Springer, 2005 2~16.
    41 Jones William M. Bandwidth-aware Co-allocating Meta-schedulers for Mini-grid Architectures. Cluster Computing, 2004 IEEE International Conference, 2004: 46~48.
    42 Evangelos Koukis Nectarios Koziris. Memory bandwidth aware scheduling for SMP cluster nodes. Parallel, Distributed and Network-Based Processing, 2005. PDP 2005. 13th Euromicro Conference, 2005: 1~10.
    43戚晶晶林泓.基于网络带宽约束的网格任务调度算法研究.计算机与数字工程, 2006: 1~4.
    44 Mahan Christopher. Web service. 2008 http://en.wikipedia.org/wiki/Web_services.
    45 W3C. Web Services Glossary. 2004 http://www.w3.org/TR/ws-gloss/.
    46 Cerami Ethan. Web Services Essentials. 2002: O'Reilly. 8~25 8~25.
    47倪晚成,刘连臣,吴澄. Web服务组合方法综述.计算机工程, 2008:
    48 Research IBM. Services Science: A Nnew Academic Discipline? 2004. 1~100 1~100.
    49 Guoshi Xu Yin Luo, Huashan Yu, and Zhuoqun Xu. Study on Bioinformatics Grid Application and its Supporting Environment. Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops (GCCW'06), 2006: 2~3.
    50姚怡星高阳.网格资源调度研究.计算机应用研究, 2004: 24~26.
    51 He Tobeabetterman.集群监视软件Ganglia. 2006 http://blog.csdn.net/Tobeabetterman_He/archive/2006/12/13/1441516.aspx.
    52 Shankland Stephen. Google spotlights data center inner workings. 2008 http://news.cnet.com/8301-10784_3-9955184-7.html?tag=nefd.lede.
    53 Freund R.F. Gherrity, M Ambrosius. Scheduling Resources in Multi-User, Heterogeneous, Computing Environments with SmartNet. Heterogeneous Computing Workshop, 1998: 5~7.
    54 Casanova Henri. The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. IEEE/ACM SC2000 Conference 2000: 4~6.
    55 Foster I. Kesselman C., Tuecke S. . The Anatomy of the Grid: Enabling ScalableVirtualOrganizations. International Journal of High Performance Computing Applications, 2001. 5(3) 200~222 .
    56 Silva Vladimir. GT4远程执行客户机快速入门. 2006 http://www.ibm.com/developerworks/cn/grid/gr-wsgram/.
    57刘丽杰王跃存,牛松森. Web服务在网格计算中的实现.计算机时代, 2006. 04 1~2.
    58陈渝.网格平台Globus的核心技术四川理工学院学报. 1 2~3.
    59 Sotomayor Borja. GT4 programmer's tutorial. Globus Toolkit develope team,
    60 Luis Ferreira Viktors Berstis. Introduction to Grid Computing with Globus. IBM Redbooks, 132~144 .
    61查礼.基于Simgrid的网格任务调度模拟.计算机工程与应用, 2003: 2~3.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700