迁移工作流容错执行模型及其实现方法研究

英文题名：Study on Fault-Tolerant Execution Model and Implement Methods in the Migrating Workflow System
作者：卢朝霞
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：工作流管理 ; 迁移工作流 ; 迁移实例 ; 故障容错 ; 状态监控 ; 故障恢复
英文关键词：workflow management ; migrating workflow ; migrating instance ; fault tolerance ; state of monitor ; recovery
学位年度：2009
导师：曾广周
学科代码：081202
学位授予单位：山东大学
论文提交日期：2009-04-05

摘要

迁移工作流是一类基于移动agent计算模式的工作流管理技术,它以移动agent为范型构建一个或多个任务执行主体(称作迁移实例),以工作位置映射工作流参与者的网络节点和服务,其中,网络节点表示迁移实例的工作场所,位置服务包括运行时服务和工作流服务两部分。迁移实例可以在某个工作位置上利用本地资源和服务执行一项或多项任务,并在必要时携带任务说明书和当前执行结果迁移到另一个能满足其要求的新工作位置上继续工作。为同一个工作流创建的多个迁移实例可以协同工作,以满足并行业务过程管理的需要。
     迁移工作流管理系统模型由一个迁移工作流管理机和若干个已经建立友好信任关系的工作位置组成,其中,迁移工作流管理机用于工作流发起者组织、管理和监控工作流,工作位置代表参与协同业务的企业、机构或个人,为迁移实例履行服务承诺。部署在工作流管理机上的主要工作部件是迁移工作流引擎,它支持工作流联盟管理、业务过程定义、迁移实例创建、派遣和监控。如果一个业务流程可以分解为若干个并行处理的子过程,则可以创建多个迁移实例并令每个迁移实例负责执行一个目标相对独立的子过程。工作位置包括停靠站和工作机网络两部分,是迁移实例的运行场所,其中,停靠站接受迁移实例的服务查询和迁移请求,在迁移实例到达后为迁移实例提供运行时环境和运行时服务,并代理迁移实例请求工作机网络上的数据服务和功能服务。如果迁移工作流管理机和工作位置一起部署,则每个工作流参与者都可以组织和发起自己的工作流,因此,迁移工作流模型容许多个业务过程管理在同一个系统中同时运行。
     因为迁移实例运行在一个跨机构的异构网络环境中,所以其任务执行过程容易受不确定性因素的影响,例如主机故障、链路故障、通信故障、服务程序和服务资源故障等。上述不确定因素不仅会干扰迁移工作流的正常执行,而且可能导致迁移实例夭折,甚至迁移工作流失败,因此,迁移实例容错是保证迁移工作流可达性、正确性和可靠性不可或缺的必要机制。迁移实例容错主要包括三个方面:执行容错、通信容错和状态容错。
     ·执行容错:执行容错是指工作流任务能够在所有工作位置上都被迁移实例可靠执行。在迁移工作流模型中,工作流任务通过迁移实例在工作位置之间的连续迁移和就地利用服务完成,工作位置不仅要为迁移实例提供运行场所,而且要为迁移实例提供可靠的工作流服务,任何主机物理故障或服务逻辑故障都会干扰迁移实例任务的正常完成。特别是对于某些要求可靠性较高的长事务任务(如订票、付款)来说,因为涉及到对重要数据库的访问,需要保证操作的事务属性,对迁移实例的执行过程,需要提供必要的容错保障机制。
     ·通信容错:通信容错是指迁移实例之间的通信信件能够被可靠地发送和递交。在迁移工作流模型中,通信是迁移实例之间实现协作的基础,只有保证通信信件能够被可靠地发送和递交,才能保证迁移实例协作的成功。存在两方面的原因会导致迁移实例间的通信失败:(1)通信链路物理故障,导致信件不能发出;(2)迁移实例移动,导致信件不能可靠递交,即当信件到达目标主机时,接收方迁移实例已经离开。对于通信链路物理故障,可以通过备份链路重传信件。对于因迁移实例移动而导致的信件递交失败,可以通过设计合理的迁移实例位置追踪和信件转发机制实现容错。
     ·状态容错:状态容错是指迁移实例的异常状态能够被及时捕获和恢复。迁移实例的状态包括正常执行状态和异常状态,异常状态主要是指迁移实例因某些物理故障或受到安全攻击而变得不可追踪或不可用。因为并行的多个业务子过程之间通常具有数据和时间关联关系,所以,执行同一工作流的多个迁移实例之间也具有特定的行为依赖关系。如果某个迁移实例的状态出现异常,则可能会引发其它迁移实例的状态异常或执行阻塞。因此,状态容错机制不仅要能够及时地捕获单个迁移实例的异常状态并使其恢复,而且要能够分析状态异常的波及范围,有效限制异常状态的蔓延。
     本文在国家自然科学基金项目的资助下,以迁移工作流系统模型为基础,吸收其他领域的研究成果,重点研究了迁移工作流的容错执行模型及其实现方法,包括迁移实例的容错执行方法、迁移实例间可靠通信方法、多迁移实例的协同监控和失败协调恢复方法等,并通过具体的应用案例对上述研究成果进行了分析和验证。本文的主要工作包括:
     1.迁移工作流容错执行模型研究
     为了实现迁移工作流的可靠执行,本文建立了系统级容错执行模型。模型从服务层、实例层、协作层三个层次描述了系统存在的故障,及相应的容错实现机制。文中给出了容错执行模型的框架结构,设计了迁移工作流实验用例,建立了迁移工作流容错执行环境。
     2.迁移实例容错执行模型及其实现方法研究。
     为了实现迁移实例的容错执行,本文将工作流任务区分两种不同的类型:时间关键任务和业务关键任务。时间关键任务特指那些对响应时间要求较高的短事务任务,如实时数据处理、在线软件更新等。业务关键任务特指对执行可靠性要求较高的长事务任务,如订票、购物、转帐等。由于业务关键任务通常涉及数据库的修改,因此需要保证执行的“只一次”性和事务属性。本文针对执行业务关键任务的迁移实例,重点研究了一类基于空间复制的容错执行阶段构建模型,文中给出了动态阶段的概念,定义了动态优先级,设计实现了阶段工作位置选取策略和动态阶段构建算法。性能和效率分析表明,该模型能够减少阶段提交的时间和通信开销,提高容错执行的效率。
     3.迁移实例容错通信模型及其实现方法研究。
     为了实现迁移实例间的通信容错,本文针对通信过程中因迁移实例移动而导致的信件不能可靠递交故障,重点研究了一类基于服务域划分和“邮局-邮箱”原理的迁移实例可靠通信模型,文中给出了通信模型的定义及体系结构,设计了迁移实例的命名和寻址方式,给出了主要的通信算法,并对模型特性和通信效率进行了分析。实验表明,该通信模型模拟现实世界中的信件投递过程,简单易实施,具有较好的可靠性和效率。
     4.迁移实例状态监控模型及其协调恢复方法研究。
     为了实现迁移实例状态容错,本文针对因某个迁移实例状态异常而导致的其它迁移实例状态不一致或执行阻塞故障,重点研究了一类针对多主体协同并发过程的协同监控模型及其相应的检查点算法,文中给出了协同监控模型定义,设计了监控者管理算法,包括监控者创建、移动与退出,并描述了监控信息的获取与处理过程以及基于监控的检查点过程。性能和效率分析表明,监控模型能够实施对迁移实例有效的监控,通过监控者的协调,实现故障恢复后全局一致性状态。
     本文工作的创新点主要体现在:
     1.针对工作位置故障导致的迁移实例执行过程受阻问题,提出了一种基于空间复制的容错执行阶段构建模型。该模型有效降低了空间复制法的时间和通信开销,提高了迁移实例执行容错方法的可用性。
     该模型通过合理规划迁移实例的任务执行阶段,避免了迁移实例执行过程中对工作位置的不必要重访,降低了总体运行时间:通过工作位置动态服务优先级的设定和计算,使得工作位置在不同时刻相对于不同的迁移实例具有不同的服务优先级,因而能够更加准确地反映工作位置作为迁移实例运行时环境的适合程度:通过尽可能选取上一阶段使用过的工作位置的策略设定,使得迁移实例既能够选取到执行环境中最优的工作位置,又减少了阶段提交的通信开销。
     2.针对通信链路故障和迁移实例移动性导致的通信失败问题,提出了一种基于服务域划分的迁移实例容错通信模型。该模型具有简单易用、高可靠性和高效率的特点,并且对系统规模扩大具有良好的适应性。
     该模型借助业务关联度的概念把工作位置划分成不同的服务域,在服务域上设置邮局,在邮局中为本域创建和外域迁来的迁移实例设置信箱,并通过熟人地址簿建立地址缓存机制,以便地址查询。每个迁移实例都有两个信箱:源信箱和活动信箱。活动信箱由迁移实例随身携带,以方便信件的直接投递和提取:源信箱固定存放在迁移实例的创建地邮局,用以支持信件直接投递失败时的信件转发。该模型具有以下优点:(1)双信箱机制。双信箱机制可以有效避免迁移过程中发生的信件丢失,并保证信件的“仅一次(exactly-once)”提交;(2)熟人地址簿机制。熟人地址簿机制支持高效透明的迁移实例寻址,不仅可以有效降低通信地址查询时间,而且可以减轻迁移实例通信对创建地的依赖,降低系统在迁移实例注册、注销等方面的开销,增强系统健壮性和提高工作效率。
     3.针对迁移实例异常状态捕获与恢复问题,提出了一种层次型协同监控模型(HCM~3)。该模型可以有效捕获和处理迁移实例的状态信息,避免迁移实例夭折造成的工作流执行失败。
     该模型将迁移实例状态监控看作一个多监控者的协同工作过程,多个监控者协同监控执行同一工作流的所有迁移实例,并通过监控者之间的协调实现不同层次异常状态的捕获、处理和恢复。该模型具有以下特点:(1)监控层次性。监控者之间具有层次关系,并与迁移实例之间的层次关系相对应,不但可以针对不同的迁移实例定制监控内容和监控手段,而且能够诊断并处理在迁移实例层和过程层发生的异常情况,协调不同层次间的监控者行为:(2)监控并发性。监控可以在不同层次上同时进行,通过监控者之间的协调达到状态的一致,在一定程度上解决了集中式监控的单点瓶颈问题,提高了监控效率;(3)监控可靠性与监控效率。模型在多个监控者之间分散监控失败风险,同集中式监控相比具有较高的可靠性,同时对每个迁移实例仅分配一个监控者,避免了过多冗余监控者带来的额外开销。
     由于迁移工作流的特殊性,也由于迁移工作流管理尚是一个刚刚开始的研究新领域,因此,无论是理论研究还是应用方面都还远未成熟。本文进一步的研究工作包括:
     1.协同监控模型的进一步完善。本文关于迁移工作流的协同监控模型还处于概念验证阶段,一方面系统对许多参数做了假定,如仅设定有限种类的故障类型,而且不考虑监控者的失败情况等:另一方面实验案例较单一,没有结合系统做大量深入的定量分析。下一步的工作将进一步考虑环境的复杂性和动态性,完善算法,并在已有的定性分析的基础上,对整个系统的各方面性能做深入的定量分析,以获得客观的评判标准。
     2.面向目标的任务分解和迁移实例执行机制。本文的研究内容基于面向过程的迁移工作流方法进行,因为业务过程分解和迁移实例执行阶段的划分需要事先对业务流程进行明确的定义,所以要求系统设计者掌握完备的工作流知识。对于跨机构、大规模协同业务过程,要求设计者掌握完备的工作流知识是十分困难的。下一步将研究目标驱动的迁移工作流机制,以减少系统性能对设计者先验知识的依赖,提高系统的易用性。
The migrating workflow is a mobile agent-based workflow management technology. The performing agent of tasks, which is named migrating instance, is constructed from mobile agent paradigm. Work place is mapped to the network node and its service of workflow participants. Network nodes are working sites of migrating instance, while services provided by nodes include runtime service and workflow service. Migrating instance can utilize local resource to perform one or several tasks at one work place. If necessary, it can migrate with its task list and current results to another satisfying work place to continue its work. Migrating instances created for a common migrating workflow can work collaboratly to meet the needs of management of parallel business processes.
     The migrating workflow management system is composed of a migrating workflow management macine and several work places with trust relations. The migrating workflow management machine is used to organize, manage and supervise workflow for the sponsor of workflow. A work place, representing an organization or a corporation which participate in collaborating works, provides services for migrating instance. The workflow engine, which is located at workflow management machine, provides support for management of workflow alliance, definition of business process, creation, dispatching and watching of migrating instance. If a business flow is divided into several parallel business processes, each of which is performed by a migrating instance. Hence multi-migrating instances performing a common business flow can be created parallely. A work place, including a docking station and work host network, is the working location of migrating instances. It receives the query and request of migrating instance, provides the running environments and running services when migrating instance arrives. Moreover, it requests for data services and functional services for migrating instances. If the migrating workflow engine and work place are deployed together, each workflow participant can organize and lanch its workflow. Hence running of multi-business processes is permitted in a common workflow system
     The running environment of migrating instance is an inter-organizational network, hence the task performing process is prone to be affected by uncertainty, e.g. host faults, channel failure, communication failure, service and resource mailfunction etc. Faults or failure will distort execution of migrating workflow, moreover, they can cause migrating instance to death, even worse, the abortion of migrating workflow. Hence fault tolerance of migrating instance is necessary to ensure reachable, correctness and reliability of migrating workflow. The fault tolerance of migrating instance includes three facts: fault tolerance of execution, fault tolerance of communication and fault tolerance of state.
     ·Fault tolerance of execution: migrating instance can perform tasks reliably at any work place. In migrating workflow system, workflow tasks are performed by migrating instance through moving consecutivly and making use of local services. Work place provides not only runtime environment, but also reliable workflow services for migrating instance. Physical faults or logical malfunction can disturb conventional execution of migrating instance. Especially for such long transactional tasks as booking or payment which demands high reliability, transactional property should be ensured because of visiting to important database. Hence fault tolerant scheme is indispensable for migrating instance.
     ·Fault tolerance of communication: communicating mails of migrating instance can be sent and submitted reliably. In migrating workflow system, communication is the basis to implement cooperation. Only when communicating mails are sent and submitted reliably, the success of cooperation can be ensured. There are two factors that can cause communication failure of migrating instance. (1) physical faults of communicating chanel, which cause mail unsent; (2) migration of migrating instance, which cause mail can not be submitted due to the random moving of instances, i.e., when a mail gets to a target, the receiver has already been gone. Mail can be resent through backup chanel for physical faults, but for failing submission due to randomly moving of migrating instance, the location tracking and mail transferring mechanism is needed.
     ·Fault tolerance of state: exceptional states of migrating instance can be catched and resumed. States of migrating instance are divided into regular state and exceptional state. The exceptional state means migrating instance is not trackable and availabal because of some physical faults or suffering from attack. Business processes which are executed parallely often possess relevance on data or time. If one migrating instance appears exceptionaly, other migrating instances will be exceptional or blocking. Therefore, the fault tolerance of state can not only catch exceptional state and resume timely, but also compute the affected scope and restrict the spread of exceptional states effectively.
     This study is mainly supported by the National Nature Science Foundation of China under Grant No.60473123 and No. 60573169, based on the migrating workflow framework. This thesis absorbs the research results of relevant fields, focuses on the fault tolerance model of the migrating workflow. Some implementation schemes are presented including the fault-tolerant execution model, the reliable communication model of migrating instances, and the collaborating monitor and coordinated recovery scheme. The results have been analyzed and validated through an experimental case. The main contributions of this thesis are described as follows:
     1. Research on the fault-tolerant execution model of migrating workflow
     In order to perform workflow tasks reliably, a fault-tolerant execution model of workflow is presented in this thesis. The model possesses hierarchical structure which is made up of service layer, instance layer and coordination layer. The possible faults of the three layers are descriped and the fault-tolerant implementation scheme is established. Moreover, the framework of fault-tolerance model is provided, the experimental case is devised, and the experimental environment is established, which are the research basis of latter chapter.
     2. Research on the fault-tolerant execution model and implementation of migrating instance.
     In order to implement the fault-tolerant execution of migrating instance, workflow tasks are divided into different types: time-critical tasks (TCT) and business-critical tasks (BCT). The former represents short transactional tasks requiring strict response time, e.g. real-time data processing, online software updating etc.; the latter represents long transactional tasks requiring high reliability, e.g. booking, paying, money transferring etc., to which the transaction property should be ensured when performing modify operation in an important database. In this thesis, a fault-tolerant stage construction model based on space replication method is provided. The definition of dynamic stage and dynamic priority is presented in this thesis. Moreover, the stage working place selection algorithm and dynamic stage construction algorithm are implemented. Performance analyses and experiment results show that the model can reduce time and communication costs of stage submission, hence improve efficiency of fault-tolerant execution.
     3. Research on the fault-tolerant communication model and implementation of migrating instances.
     In order to implement fault-tolerant communication of migrating instance, we have studied the reliable communication model based on sevice domain and "postoffice-mailbox" mode. The communication model is tailored for mail submission failure due to the randomly moving of migrating instance. In this thesis, the definition of communication model and corresponding system framework are descriped. The naming and address-locating scheme is described. The main communication algorithms are proposed. Moreover, the model characteristics and communication efficiency are analyzed. The experiment results show that the communication scheme is simple, reliable and efficient.
     4. Research on state monitor and coordinated recovery model and implementation.
     In order to implement fault tolerance of state, we studied the collaborating monitor model of a collaborating and parallel process with multi-executing agents and a corresponding checkpoint algorithm. The model is tailored for inconsistent state and execution blocking due to exceptional state of a migrating instance. The collaborating monitor model and monitor management algorithms are presented. The information capturing and disposing process is described. Moreover, a checkpoint method based on the monitor model is provided. Performance analyses and experiment results show that the model performs a very effective monitoring to migrating instances and can recover from failure with consistent state by coordinating monitors.
     The main innovative contributions of this thesis are:
     1. In order to avoid execution blocking caused by failure of work place, a fault-tolerant execution stage construction model based on space replication is provided. The model can optimize efficiency of stage construction, reduce costs on time and communication, and improve usability of the model.
     The model can plan tasks execution of a migrating instance according to the executing ability of working place, avoid the unnecessary revisit to some working places, and lessen total running time of migrating instance; moreover, a method to evaluate working places called dynamic priority is defined. For a working place, its priority is distinct for different migrating instance at different time. The dynamic priority method can reflect the adaptability of a working place as the runtime environment of the migrating instance. In addition, the working place selection algorithm is provided to select the most perfect working places for a migrating instance, at the same time to lessen the communication costs of stage submission.
     2. Aiming at communication failure caused by network faults or randomly moving of migrating instance, a reliable communication model based on service domain is presented. Compare to traditional communication methods of mobile agent, the model is easy to use, reliable, efficient and adaptable to a larger system scope.
     The communication model divides the whole working places into several service domains, each of which sets a postoffice in which two mailboxes of migrating instance locate. Each migrating instance has two mailboxes, one is source mailbox, the other is active mailbox. The source mailbox locates at the home postoffice, while the active one travel along with the migrating instance. An address_book is set at postoffice to buffer addresses of communicated migrating instances to be used for forthcoming query. The model bears advantages as follows: (1) every instance has double mailboxes, while the hmb takes on a guide role, the amb is the actual component to receive messages, which ensures reliable submission with exactly-once property, reduces the bandwidth of triangular routing and overheads of register and deregister. (2) the transparent and efficient addressing strategy can decrease addressing time and lessen dependency to home, moreover enable more robust and scalable system. Original experiments show that the model can satisfy the requirements on reliability, adaptability and efficiency of migrating workflow system. Future works will focus on establishing a more secure system to be applied in a more general environment. (3) fault tolerant, the model can avoid message loss due to the randomly moving of migrating instance, and ensue the sequential and exactly once submitting of mail.
     3. Aiming at catching and resuming of exceptional state, a hierarchical collaborating monitor model (HCM~3) is provided. The model can get and dispose the state of migrating instance, avoid workflow failure caused by the death of migrating instance.
     The model looks upon monitor of the while workflow as a coordinated parallel process, dispatches multi-monitors to implement collaborated monitoring for all migrating instances performing a workflow, and implement the catching, disposing and resuming of exceptional state at different level through coordination of monitors. The model possesses merits hereinafter: (1) hierarchical. Monitors have hierarchical relations with one another, which can tailor monitor for different migrating instances, diagnose when exception appears, and coordinate monitor's work at different level. (2) parallel. Monitoring is parallel. The system state can be kept consistent through coordination among monitors. The model can avoid single point failure and promote efficiency of monitor. (3) reliable and high efficiency. The model is reliable since it disperses monitoring tasks into several monitors. At the same time, only one monitor is distributed to a migrating instance, hence it can lessen additional costs introduced by overabundant monitors.
     Since the migrating workflow is an emerging workflow research field, it is far from mature in both theory and applications. To further the study started in this thesis, the author proposes the following future works:
     1. The HCM~3 need furthering improvement. Now the HCM~3 is at its primary stage and not very mature because we set many assumptions when building the system and utilize a simple case to do experiments. The further work will polish the model with considering complexity and dynamic of application environment. In addition to qualitative analyses, we will process many thorough quantitative analyses on many profiles of the model to get more impersonal evaluation.
     2. Research on the target-oriented task decomposition and migrating instance execution scheme will be undertaken. In this thesis, the flow decomposition and instance dispatching scheme is only a direct and simple division of business flow, which is based on definite definition and complete knowledge about the business flow. The adaptability is embodied by binding the implementation details to migrating instance at runtime. The further work will study the target-oriented workflow scheme, which can lessen the dependency on the deviser's knowledge and awareness about the workflow, hence improve the usability.

引文

[Alan2002] Alan Fedoruk, Ralph Deters: Improving fault-tolerance by replicating agents. AAMAS 2002: 737-744

    [Alonsol994] Alonso G, Agrawal D, Abbadi E A, et al. ExoticaFMQM: a persistent message-based architecture for distributed workflow management. Technical Report, RJ9912, IBM Almaden Research Center, 1994

    [Alonso2000] G. Alonso, C.Hagen, D. Agrawal, A. E. Abbadi, C.Mohan,Enhancing the Fault Tolerance of Workflow Management Systems. IEEE Concurrency. July-September, 2000: 74-81

    [Alvisil999] L. Alvisi, E. Elnozahy, S. A. Husain, A.D.Mel, An Analysis of Communication Induced Checkpointing, Fault-tolerant computing, Digest of Papers, Twenty-ninth Annual International Symp.,1999: 242-249

    [Ashfield2002] B.Ashfield, D.Deugo, F. Oppacher, T.White, Distributed deadlock detection in mobile agent systems. IEA/AIE 2002, LNAI 2358, 2002:146-156

    [Baik2003] M.Baik, I.Kang, C.Hwang, and Y. Rang, Optimistic Fault-Tolerant Approach for Mobile Agent in Multi-Region Mobile Agent Computing Environment, in Proc. PDPTA, 2003:1238-1243

    [Baumann1998] J. Baumann, K. Rothermel, The shadow approach: an orphan detection protocol for mobile agents, MA'98, LNCS 1477, 1998:2-13

    [Bernstein2000] Bernstein, A., Populating the specificity frontier:IT-support for dynamic organizational processes. PhD Dissertation, MIT,2000.

    [Bhargaval988] B. Bhargava, Shy-Renn Lian, Independent Checkpointing and Concurrent Rollback Recovery for Distributed Systems—An Optimistic Approach, IEEE Proc. 7th Symp. on Reliability in Distributed Systems, Oct.1998:3-12

    [Blake2003] Blake M. B., Agent-Based Communication for Distributed Workflow Management using Jini Technologies, International Journal on Artificial Intelligence Tools (IJAIT), 2003,12[1]:81-99

    [Cabril998]Cabri, G., Leonardi, L., Zambonelli, F. Reactive tuple spaces for mobile agent coordination. In:Rothermel, K., Hohl, F.,eds. Proceedings of the 2~(nd) Mobile Agents International Workshop. LNCS1477,Stuttgart:Springer-Verlag, 1998:237-248.

    [Caol992] J. Cao, K.C.Wang, Efficient Synchronous Checkpointing in Distributed Systems, Australia Computer Science Communicarions, 1992,14[1]:165-179

    [CA0JN2002] Jiannong Cao , Xinyu Feng , Jian Lu , Sajal K. Das,Mailbox-Based Scheme for Designing Mobile Agent Communication Protocols,Computer, 2002,35[9]:54-60

    [CA0JN2001] Jiannong Cao, G.H. Chan, W. Jia, and T. Dillon, Checkpointing and Rollback of Wide-Area Distributed Applications Using Mobile Agents,Proc. IEEE 2001 International Parallel and Distributed Processing Symposium (IPDPS2001) (IEEE Computer Society Press), April 2001, San Francisco, USA.

    [CA0JN2004]Jiannong Cao, Yifeng Chen, Kang Zhang, Yanxiang He,Checkpointing in Hybrid Distributed Systems, International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN'04), 2004:136-142

    [Chanl998] Chan D H K, Vonk J, Sanchez G, et al. A specification language for the WIDE workflow model EB/OL . Proc. of the 1998 ACM symposium on Applied Computing, Atlanta, 1998: 197- 199

    [Chandyl985] K. M. Chandy, L.Lamport, Distributed Snapshots:Determining Global State of Distributed Systrms, ACM Trans Computer Systems, 1985,3[1]:63-75
    [CHENGX2003]程欣,杨孝宗,移动代理的故障模型分析,计算机工程与应用,2003,39[13]:16-18
    [Choi2004]S.Choi,M.Baik,H.Kim,J.Yoon,J.Shon,C.Hwang,Region-based Stage Construction Protocol for Fault tolerant Execution of Mobile Agent.Advanced Information Networking and Applications,2004.AINA 2004.18th International Conference on,Volume:2,2004:499-502
    [Cichocki1998]A.Cichoeki,A.Rusinkiewiez,M,Migrating Workflows,in Dogac,A.et.al(eds.):Workflow Management Systems and Interoperability.Berlin,Heidelberg(springer Verlag),1998:339-355
    [Cichocki1999]A.Ciehoeki,M.Rusinkiewicz,Providing Transactional Properties for Migrating Workflows.IEEE 10th Workshop on database &Expert Systems Applications,September,Florence,Italy,1999:90-94
    [Cichocki2004]A.Cichocki,M.Rusinkiewicz,Providing Transactional Properties for Migrating Workflows.Mobile Networks and Applications,2004,[9]:473-480.
    [Dalmeijer1998]M.Dalmeijer,E.Rietjens,D.Hammer,A.Aerts,M.Schoede.A Reliable Mobile Agents Architecture.Proceedings of ISORC98,1998:64-72
    [Danny1999]Danny B.Lange and Mitsuru Oshima.Seven Good Reasons for Mobile Agents.Communications of the ACM,1999,42[3]:88-98
    [Das1997]S.Das,K.Kochut,J.Miller,A.Sheth,and D.Worah,ORBWork:A Reliable Distributed CORBA-based Workflow Enactment System for METEOR_2,Technical Report UGA-CS-TR-97-001,LSDIS Lab,CS Department,Univ.of Georgia,February 1997.
    [DENGSG2004]邓水光,吴朝晖,俞镇支持动态建模的工作流管理系统的研究与设计,计算机辅助设计与图形学学报,2004,16[5]:712-718
    [Edward2001]Edward Ao Stohr,J.Loan Zhao,Workflow Automation:Overview and Research Issues,Information Systems Frontiers 2001,3[3]:281-296
    [Ehrler2005]L.Ehrler,M.Fleruke,M.Purvis,et al.Agent-based Worklow Management Systems(WfMSs) JBees:a distributed and adaptive WfMS with monitoring and controlling capabilities,Information Systems and E-Business Management,2006,4[1]:5-23
    [FANGC2003]范国闯等,Web应用服务器综述,软件学报,2003,14[10]:1728-1739.
    [FANYS2000]范玉顺,吴澄工作流管理技术研究与产品现状及发展趋势,计算机集成制造系统CIMS,2000,1[6]:1-7
    [FANYS2001]范玉顺主编,工作流管理技术基础—实现企业业务过程重组、过程管理与业务过程自动化的核心技术,北京:清华大学出版社,施普林格出版社,2001
    [FANYS2002]范玉顺,吴澄,一种提高系统柔性的工作流建模方法研究,软件学报,2002,13[4]:833-839。
    [Feng2001]Feng XY,Cao JN,L(u|¨) J,Chan H,An Efficient Mailbox-Based Algorithm for Message Delivery in Mobile Agent Systems,LNCS2240,Springer-Verlag,2001:135-151
    [FENGHJ2004]冯华君,朱信忠,赵建民,基于Agent的工作流系统探讨,计算机工程,2004,30[14]:166-168
    [Finin1998]T.Finin,Y.Labrou,Y.Peng,Mobile Agents can Benefit from Standards Efforts on Interagent Communication.IEEE Communication Magazine,July,1998:50-56
    [Foster1999]S.Foster,D.Moore,M.Flester,B.Nebesh,Control and Management in a Mobile Agent Workflow Architecture,Proceedings of Agents'99,Seatle,May,1999.
    [Georgakopoulos1995]D.Georgakopoulos,M.Hornick,A.Sheth,An Overview of Workflow Management:From Process Modeling to Workflow Automation Infrastructure.Distributed and Parallel Databases,3,1995:119-153
    [GOU2000]Hongmei Gou,Biqing Huang,Wenhuang Liu,Shouju Ren,Yu Li,An agent-based approach for workflow management,Systems,Man,and Cybernetics, 2000 IEEE International Conference on, 2000:292-297

    [Gravesl997] R. J.Graves. Guest editorial: Intelligent Autonomous Agents in Product. International Journal on Production Planning and Control,1997,8[8]:725-726

    [Gray2002] R. Gray, G. Cybenko, D. Kotz, and D. Rus, Mobile Agents:Motivations and State of the Art, Handbook of Agent Technology, AAAI/MIT Press, 2002.

    [Hagen2000] Hagen, C.; Alonso, G..Exception handling in workflow management systems, IEEE Transactions on Software Engineering, 2000,26[10]: 943-958

    [Henrik2000] Henrik Stormer, Task Scheduling in Agent-Based Workflows.In: Proceedings of the International ICSC Symposium on Multi-Agents and Mobile Agents in Virtual Organizations and E-Commerce (MAMA), Wollongong,Australia, 2000

    [Hevia2000] A. Hevia, A. Vasa. Fault-Tolerant Protocols for Mobile Agents:A Survey. 2000, www.cs.ucsd.edu/classes/sp00/cse221/reports/hev-vas.ps

    [HUT2004] Tao Hu, Li Baohong.A Workflow Coordination Model for Mobile Agents based on Role and Task, In Proc. Of 2004 IEEE Intemational Conference on Systems, Man and Cybemetics. 2004:3875-3879
    [Jennings2000] Jennings N. R., Faratin P., Norman T. J., O'Brien P. and Odgers B. Autonomous Agents for Business Process Management. Int. Journal of Applied Artificial Intelligence 2000,14[2]: 145-189
    [Jin2004] G. Jin, B. Ahn and K. D. Lee, A Fault-Tolerant Protocol for Mobile Agent, A.Lagana et al, (Eds.):ICCSA 2004, LNCS 3045, pp. 993-1001, 2004.

    [Pleisch2003] S. Pleisch and A. Schiper, Fault-tolerant mobile agent execution, IEEE Trans. Comput., 2003,52[2]:209-222
    [Johansenl999] D. Johansen, K. Marzullo, F. B. Schneider, K. Jacobsen,and D. Zagorodnov. NAP: Practical fault-tolerance for itinerant computations.In Proc.of the 19th IEEE International Conference on Distributed Computing Systems(ICDCS),Austin,Texas,June 1999:180-189.
    [Kamath1998]M.Kamath,K.Ramamritham,Failure handling and coordinated execution of concurrent workflows,ICDE 1998:334-341
    [Karen2003]Karen Witting,Jim Challenger,Brian O'Connell,Monitoring Distributed Systems:A Publish/Subscribe Methodology and Architecture.IFIP/IEEE Eighth International Symposium on Integrated Network Management(IM 2003),March 24-28,2003:89-92
    [Karnik1998]Karnik,N.M.,Tripathi,A.R.,Design Issues in Mobile-Agent Programming Systems,IEEE Concurrency,1998,6[3]:52-61
    [Kim2002]H.Kim,H Y.Yeom,T.Park,H.Park The Cost of Checkpointing,Logging and Recovery for the Mobile Agent Systems,Proceedings of the 2002Pacific Rim International Symposium on Dependable Computing(PRDC'02),2002:45-48
    [Kotz2002]D.Kotz,R.Gray,and D.Rus,Future Directions for Mobile-Agent Research,TechnicalReportTR2002-415,Dept.of Computer Science,Dartmouth College,Jan.2002.
    [Kwok1993]A.Kwok et D.Norrie,Intelligent Agent Systems for Manufacturing Applications.Journal of Intelligent Manufacturing,1993,4:285-293,1993.
    [LAIYD2003]赖耀东,朱建新,基于多Agent虚拟组织工作流管理系统的异常处理机制计算机工程,2003,29[2]:68-69,124
    [Lee2000]D.Lee,B.Jeon,Y.Kim,Mobile agents for reliable migration in networks,IDEAL 2000,LNCS 1983,2000:344-351
    [LIHC1999]李红臣,史美林,Agent在工作流管理系统中的应用研究,通信学报,1999,20[9]:16-22
    [LIHX2003]李红霞,王晓琳,曾广周,迁移工作流系统中的自适应信任模型,计算机应用,2003,23[11]:97-99
    [LIHX2004]李红霞,王晓琳,曾广周,迁移工作流系统中的迁移域组织与动态迁移实例寻址研究,计算机工程与应用,2004,36:98-101
    [LIUR2006]柳荣,徐东安,基于移动Agent工作流系统的工作模式,计算机工程,2006,32[17]:135-137
    [LIUTT2005]刘添添,移动agent系统的一种安全容错机制,计算机工程,2005,31[18]:116-118
    [LUOHB2000]罗海滨,范玉顺,吴澄,工作流技术综述,软件学报2000,11[7]:899-907
    [LUX2002]陆新,姜浩,移动Agent在分布式工作流管理系统中的应用,东南大学学报(自然科学版),2002,32[1]:119-123.
    [LUZX2003]卢朝霞,曾广周,Petri网在商务工作流建模中的应用研究,计算机工程与应用,2003,39[19]:199-202.
    [Lyu2003]M.R.Lyu and T.Y.Wong,A Progressive Fault Tolerant Mechanism in Mobile Agent Systems.in Proceedings 7th World Multiconference on Systemics,Cybernetics and Informatics(SCI2003),Orlando,Florida,July 27-30 2003,Volume Ⅸ:299-306.
    [Lyu2004]Lyu,M.R.,Xinyu Chen,Tsz Yeung Wong.Design and Evaluation of a Fault-Tolerant Mobile-Agent System.Intelligent Systems,IEEE[see also IEEE Expert],2004,19[5]:32-38.
    [Meng2000]J.Meng,A.Helal,and Stanley Su,An Ad-noc Workflow System Architecture Based on Mobile Agent and Rule-Based Processing,The special session on Software Agent-Oriented Workflows,Proceedings of the International Conference on Parallel and Distributed Computing Techniques and Applications,Las Vegas,Nevada,June 2000:245-251.
    [Meng2002]J.Meng,Stanley Y.W.Su,Herman Lam,Abdelsalam Helal,Achieving Dynamic Inter-Organizational Workflow Management by Integrating Business Processes,Events and Rules.HICSS 2002:10-17
    [Mishra2003]S.Mishra,P.Xie,Interagent Communication and Syschronization Support in the DaAgent Mobile Agent-Based Computing System. IEEE Transactions on parallel and distributed systems, 2003, 14[3]:290-306

    [Mohammadi2005] H. Hamidi, K. Mohammadi, Modeling and evaluationg of fault tolerant mobile agents in distributed systems, Proc. Of the 2~(nd) IEEE conf. on wireless & optical communications networks (WOCN2005), March 2005:91-95

    [Murphyl999] Murphy A, Picco GP, Reliable communication for highly mobile Agents. In: Proceedings of the Agent Systems and Architecture/Mobile Agents (ASA/MA)'99. 1999:141-150.

    [Nichols2005] J. Nichols, H. Demirkan, M. Goul: Towards a Model of Fault Tolerance Technique Selection in Static and Dynamic Agent-Based Inter-Organizational Workflow Management Systems. HICSS 2005: 188-195

    [Osman2004] T. Osman W. Wagealla and A. Bargiela, An Approach to Rollback Recovery of Collaborating Mobile Agents, IEEE Trans. Systems, Man and Cybernetics, Part C, 2004, 34[1]:48 - 57.

    [Ouelhadj2000] D.Ouelhadj, C. Hanachi, B. Bouzouia, Multi-agent Architecture for Distributed Monitoring in Flexible Manufacturing Systrms (FMS). Proc. of the 2000 IEEE International Conference on Robotics & Automation. 2000:2416-2421.

    [Park2002] T. Park, I. Byun, H.Kim, H Y. Yeom, The Performance of Checkpointing and Replication Schemes for Fault Tolerant Mobile Agent Systems, 21st IEEE Symposium on Reliable Distributed Systems (SRDS' 02) ,2002:256-261

    [Park2004a] T.Park, I. Byun, H Y.Yeom, Lazy agent replication and asynchronous consensus for the fault-tolerant mobile agent system,NETWORKING 2004, LNCS 3042, 2004:1060-1071

    [Park2004b] T. Park, A.Sood, MARE: A Fault-Tolerant Mobile Agent System, ICOIN 2004,LNCS 3090,2004:1035-1044
    [Park2004c]T.Park,h Fault-Tolerant Mobile Agent Model in Replicated Secure Services,ICCSA 2004,LNCS 3043,2004:500-509
    [Pears2003]S.Pears,J.Xu,C.Boldyreff,Mobile agent fault tolerance for information retrieval applications:an exception handling approach,ISADS'03,2003:115-125
    [PENGY2005]彭勇,基于移动Agent的远程监控系统的设计与实现,计算机工程与应用,2005,41[5]:224-228
    [Pham1998]V.Pham,A.Karmouch,Mobile Software Agents:An Overview,IEEE Communications magazine,1998:26-38
    [Pleisch2000]S.Pleisch,A.Schiper.Modeling Fault Tolerant Mobile Agent Execution as a Sequence of Agreement Problems.Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems(SPd)S).Nuremberg,Germany.Oct.2000:11-20
    [Pleisch2001]S.Pleisch,A.Schiper.FATOMAS-A Fault-Tolerant Mobile Agent System Based on the Agent-Dependent Approach.Proceedings of International Conference on Dependable Systems and Networks(DSN'01).Goteborg,Sweden,July,2001:215-224
    [Pleisch2003]S.Pleisch and A.Schiper,Fault-tolerant mobile agent execution,IEEE Trans.Comput.,Feb.2003,52[2]:209-222
    [Roberto2003]Roberto Silveira Silva Filho,Jacques Wainer,Edmundo R.M.Madeira:A Fully Distributed Architecture for Large Scale Workflow Enactment.Int.J.Cooperative Inf.Syst.2003,12[4]:411-440
    [Romanovsky2001]Romanovsky,A.Looking ahead in atomic actions with exception handling,Reliable Distributed Systems,2001.Proceedings.20th IEEE Symposium on 28-31 Oct.2001:142-151.
    [Rong2001]Rong Xie,naniela Rus,Cliff Stein,Scheduling Multi-task Agents,Proceedings of the 5th International Conference on Mobile Agents, LNCS 2240,2001:260-276
    [Rothermel1998]K.Rothermel and M.Strasser.A fault-tolerant protocol for providing the exactly-once property of mobile agents.In Proc.of the 17th IEEE Symposium on Reliable Distributed Systems(SRDS),West Lafayette,Indiana,Oct.1998:100-108.
    [Ryszard2003]Ryszard K.,Developing Mobile and Intelligent Agents in Interconnected e-Marketplaces,Transactions of the Society for Design and Process Science,2003,7[3]:109-123.
    [Schneider1997]F.Schneider.Towards fault-tolerant and secure agentry.In Proceedings of the 11th International Workshop on Distributed Algorithms,Saarbr(u|¨)cken,Germany,Sept.1997:1-14
    [Sen2005]S.Sen,H.Demirkan,M.Goul,Towards a Verifiable Checkpointing Scheme for Agent-Based Interorganizational Workflow System "Docking Station" Standards:System Sciences,2005.HICSS'05.Proceedings of the 38th Annual Hawaii International Conference on,03-06 Jan.2005:165a -165a
    [Seng2001]Seng Wai Loke,Arkady B.Zaslavsky:Towards Distributed Workflow Enactment with Itineraries and Mobile Agent Management.E-Commerce Agents 2001:283-294
    [SHIDX2002]史殿习,吴泉源,王怀民,邹鹏,嵌套式动态容错协议的研究与设计,软件学报,2002,13[2],235-238
    [SHIML1999]史美林等,WfMS:工作流管理系统,计算机学报,1999,22[3]:325-334
    [Silva2000a]L.Silva,V.Batista,and J.Silva.Fault-tolerant execution of mobile agents.In Proc.of the International Conference on Dependable Systems and Networks,New York,June 2000:135-143.
    [Silva2000b]F.M.A.Silva,R.J.A.Macedo,Reliability requirements in mobile agent systems.Proc.Of the 2~(nd) workshop on tests and fault tolerance (Ⅱ WTF2000),Curitiba,Brazil,2000:1344-1350
    [Singh1999]Singh,M.P.,Huhns,M.N.,Multiagent Systems for Workflow,Int.Journal of Intelligent Systems in Accounting,Finance and Management,1999,Vol.8,105-117.
    [Stormer2000]H.Stormer,K.Knorr,J.Eloff.,A model for security in agent-based workflows.INFORMATIK INFORMATIQUE,2000,6[3]:24-29
    [Straβer1998]M.Straβer,K.Rothermel,C.Maih(o|¨)fer.Providing Reliable Agents for Electronic Commerce.Lecture Notes in Computer Science.Volume 1402,1998
    [SUNRZ2003]孙瑞志,史美林,支持工作流动态变化的过程元模型,软件学报,2003,14[1]:62-67。
    [Tanaka2007]Y.Tanaka,N.Hayashibara,M.Takizawa,T.Enokido,A mobile agent model for fault-tolerant manipulation on distributed objects,Cluster Comput,2007,10:81-93
    [TANZP2003]谭支鹏,易宝林,冯玉才,王元珍,基于Agent的工作流管理系统的研究,华中科技大学学报(自然科学版),2003,31[3]:46-48
    [TingC1996]Cai Ting,Gloor A,Nog S.DartFlow:A Workflow Management System on the Web Using Transportable Agents.Technical Report PCS-TR96-283,Dartmouth College,1996.
    [TAOXP2000]陶先平,冯新宇,李新,张冠群,吕建.Mogent系统的通信机制.软件学报,2000,11(8):1060-1065
    [TAOY2001a]陶冶,范玉顺,罗海滨,分布式工作流系统的可扩展性和柔性研究,信息与控制,2001,30[3]:218-223
    [TAOY2001b]陶冶,范玉顺,罗海滨,分布式工作流系统的可靠性研究,计算机科学,2001,28[5]:6-10
    [Vogler1997]H.Vogler,Thomas Kunkelmann,M.Moschgath,Distributed Transaction Processing as a Reliability Concept for Mobile Agents.FTDCS 1997:59-65
    [WANG2005]M.Wang,H.Wang,D.Xu.The design of intelligent workflow monitoring with agent technology.Knowledge-Based Systems 18(2005):257-266
    [WANGH2001]王红,曾广周,林守勋.可移动agent系统位置透明通信的一种实现.计算机学报,2001,24[4]:442-446
    [WANGHQ2001]Huaiqing Wang,Dongming Xu,Collaborative Multi-agents for Workflow management.Proc.Of 34~(th) Hawaii International Conference on System Science,2001:1015-1023
    [WANGJ2004]王静,曾广周.轻量级迁移实例的实现研究,计算机工程,2004,30[22]:137-139
    [WANGJH2001]王建华,刘卫东,徐万鸿,基于agent的工作流模型的研究与应用,计算机工程与应用,2001,37[17]:60-62
    [WANGK1999]王恺,邓铁清,周堤基,协同工作流Agent的研究,系统工程与电子技术,1999,21[8]:48-51
    [WANGMH2002]Minhong Wang,Huaiqing Wang,Intelligent Agent Supported Flexible Workflow Monitoring System,CAISE 2002:787-791
    [WANGT2001]汪涛,吴耿锋,黄力芹,工作流管理的现状和未来趋势,小型微型计算机系统,2001,22[2]:232-236
    [WANGXH2003]王晓宏,孙壮志,计算机协同设计中工作流可靠性的研究,计算机工程与应用,2003,39[3]:47-49
    [WANGY2003]汪芸,分布环境下容错组成员主动退出组行为的研究,中国科学(E辑),2003,33(12),1077-1086
    [WANGYH2005]Ying-Hong Wang,Huan-CHao Keh,Tsang-Ching Hu,Cheng-Horng Liao,A Hierarchical Dynamic Monitoring Mechanism for Mobile Agent Location,Proc.Of the 19th International Conference on Advanced Information Networking and Applications(AINA' 05),2005:351-356
    [WANGYUE2003]王跃,刘卫东,王诚,基于agent工作流系统中的异常处理,计算机工程与应用,2003,39[7]:177-179
    [Weissenfels1996]J.Weissenfels,D.Wodtke,G.Weikum,A.Kotz Dittrich,The MENTOR Architecture for Enterprise—wide Workflow Management,in:A.Sheth(ed.),Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems,Athens,GA.,May 1996,http:/llsdis.cs.uga.edu/activities/NSF-workflow/proc_cover.html
    [WfMC1994]The Workflow Management Coalition.The workflow reference modell Workflow Management Coalition,Tech Rep:TC001003,1994.
    [WfMC1996]Workflow Management Coalition Terminology and Glossary (WFMC-TC-1011),Technical Report,Workflow Management Coalition,Brussels,1996
    [WfMC1999]Workflow Terminology & Glossary Version 3.0,WFMC-TC-1011,Feb.1999.
    [WfMC2000]Workflow Management Coalition standard-Interoperability Wf-XML binding document number WFMC-TC-1023,Version 1.0,May,2000.
    [Whire1997]White,J.E.Telescript technology:mobile agents.In:Bradshow,eds.Software Agent.Cambridge:AAAT Press/MIT Press,1997.230-242
    [White1996]J.White,Mobile agent white papers,1996,General Magic,http://www.klynch.com/documents/agents/index.html
    [WUG2001]吴刚,吴泉源,王怀民,一种移动智能体的工作流管理模型,计算机辅助设计与图形学学报,2001,13[6]:527-531
    [WUG2002]吴刚,王怀民,吴泉源,一个移动智能体位置管理与可靠通信的算法,软件学报,2002,13[2]:269-273
    [WIJXG2008]吴修国,曾广周,韩芳溪,王睿.迁移工作流中的目标规划研究.计算机科学,2008,35[1]:147-150
    [WUZH1999]吴朝晖,潘云鹤,工作流管理技术:机遇和挑战,计算机科学,1999,26[10]:20-23
    [YANGB2003]杨博,刘大有,杨鲲,张朝辉.移动Agent系统的主动通信机制. 软件学报,2003,14[07]:1338-1344
    [YANGGP2005]杨公平,曾广周,卢朝霞,移动agent系统中的排队机制研究,计算机学报,2005,28[11]:1817-1822
    [YANGGP2006]杨公平,曾广周.基于导航的迁移工作流组织与执行.吉林大学学报(工学版),2006,36[5]:819-823
    [YANGWS2001]Weishuai Yang,Shanping Li,Ming Guo,Mobile agent:enhancing workflow interoperability Info-tech and Info-net,Proceedings of ICII 2001,Volume 5,29 Oct.-1 Nov.2001,Beijing,276-282
    [Yoo2001]J.Yoo,D.Lee,Y.Suh,and D.Lee.Scalable Workflow System Model Based on Mobile Agents.Intelligent Agents:Specification,Modeling and Applications,Lecture Notes in Artificial Intelligence,2132.2001,pp.222-236
    [YUF2003]俞锋,王茜,基于移动代理平台Aglet的柔性工作流的研究与实现,东南大学学报(自然科学版),2003,33[2]:172-176
    [YUZ2003]Yu Zhen,Wu Zhaohui,A mobile-agent based Interorganizational Workflow Management System,Proc.of the 8th International Conference on Computer Supported Coorperative Work on Design,2003,Page(s):389-395
    [ZENGGZ2003]曾广周,党妍.基于移动计算范型的迁移工作流研究.计算机学报,2003,26[10]:1343-1349
    [ZENGGZ2007]曾广周,杨公平,王晓琳.基于Agent能力自信度的任务分配问题研究.计算机学报,2007,30[11]:1922-1929
    [ZENGW2005]曾炜,阎保平,工作流模型研究综述,计算机应用研究,2005,22[5]:11-13,22
    [ZHANGZJ2003]张志君,范玉顺,一种高性能的分布式工作流系统实现框架,计算机集成制造系统-CIMS,2003,9[6]:431-435
    [ZHAOW2003]赵文等,工作流元模型的研究与应用,软件学报,2003,14[6]:1052-1059。
    [ZHOULX2003]周龙骧,刘添添,移动agent综述,计算机应用与软件,2003, 20[11]:19-23
    [ZHUYL2000]朱云龙等,基于Agent的工作流协调模型研究,小型微型计算机系统,2000,21[7]:737-739
    [1] V.Pham, A.Karmouch, Mobile Software Agents: An Overview, IEEE Communications Magazine, 26:38,1998
    [2] S.Pleisch, A.Schiper. FATOMAS-A Fault-Tolerant Mobile Agent System Based on the Agent-Dependent Approach. Proceedings of International Conference on Dependable Systems and Networks(DSN'01). Goteborg, Sweden, July, 2001

    [3] D.Johansen, K.Marzullo, F.B.Schneider.K.Jacobsen, D.Zagorodnov. Nap:Practical fault-tolerance for itinerant computations. The 17~(th) ICDCS, pages 180-189, June 1999
    [4] K. Rothermel and M. Straβer. A fault-tolerant protocol for providing the exactly-once property of mobile agents. In Proc. of the 17th IEEE Symposium on Reliable Distributed Systems (SRDS), pages 100-108, West Lafayette,Indiana, Oct. 1998.
    [5]M.Straβer,K.Rothermel,C.Maih(o|¨)fer.Providing Reliable Agents for Electronic Commerce.Lecture Notes in Computer Science.Volume 1402,1998
    [6]L.Silva,V.Batista,and J.Silva.Fault-tolerant execution of mobile agents.In Proc.of the International Conference on Dependable Systems and Networks,pages 135-143,New York,June 2000.
    [7]G.Jin,B.Ahn and K.D.Lee,A Fault-Tolerant Protocol for Mobile Agent,A.Lagana et al,(Eds.):ICCSA 2004,LNCS 3045,pp.993-1001,2004.
    [8]S.Pleisch and A.Schiper,Fault-tolerant mobile agent execution,IEEE Trans.Comput.,vol.52,no.2,pp.209-222,Feb.2003.
    [9]Alan Fedoruk,Ralph Deters:Improving fault-tolerance by replicating agents.AAMAS 2002:737-744
    [10]T.Park,I.Byun,H.Kim,H Y.Yeom,The Performance of Checkpointing and Replication Schemes for Fault Tolerant Mobile Agent Systems,21st IEEE Symposium on Reliable Distributed Systems(SRDS'02),2002,pp.256-261
    [11]S.Choi,M.Baik,H.Kim,J.Yoon,J.Shon,C.Hwang,Region-based Stage Construction Protocol for Fault tolerant Execution of Mobile Agent.Advanced Information Networking and Applications,2004.AINA 2004.18th International Conference on,Volume:2,29-31 March 2004 Pages:499-502
    [12]Rong Xie,Daniela Rus,Cliff Stein,Scheduling Multi-task Agents,Proceedings of the 5th International Conference on Mobile Agents,LNCS 2240,2001,pp.260-276

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700