医学影像云服务平台基础架构研究与实践

英文题名：Research and Practice of Medical Imaging Cloud Services Platform
作者：李彭军
论文级别：博士
学科专业名称：生物医学工程
中文关键词：医学影像 ; 区域医疗 ; 云计算 ; 分布式文件系统 ; 分布式计算 ; 软件即服务
英文关键词：Medical Imaging ; Regional Healthcare ; Cloud Computing ; Distributed File System ; Distributed Computing ; Software as a Service
学位年度：2011
导师：陈光杰
学科代码：0831
学位授予单位：南方医科大学
论文提交日期：2011-03-31
答辩委员会主席：张雪林

摘要

医学影像技术在近十多年来取得了突飞猛进的发展。新技术、新设备不断涌现。320排螺旋CT、超高场强磁共振、分子影像、功能影像、多模态融合成像等技术大大丰富了医生的诊断手段,提高了疾病的诊断效果,但是同时也带来了一定的问题：1)高端影像设备价格昂贵,动辄数百万到数千万元,很多医院简单地将设备档次作为体现医疗水平的标准,竞相引进高端设备,导致医疗成本居高不下；2)医学影像设备一次扫描能产生数百至数千幅图像,病人带走的胶片只包含其中极少一部分图像,且无法进行参数调节和三维、动态显示,诊断价值大打折扣。病人转院时,医生多会以此为由要求病人重新检查,不必要的重复检查进一步加重了居民的医疗负担；3)X线机、超声等影像设备在小医院已有很高的普及率,沿海发达地区部分乡镇医院甚至引进了64排CT以上的先进影像设备,但是却缺乏优秀的影像诊断医生,设备的利用率低；4)基层医疗机构严重缺乏资金、设备、技术和人才。人们有病都往大医院挤,导致了大医院“人满为患”、基层医院“门可罗雀”。这种医疗资源不均衡的现状是造成“看病难、看病贵”的重要原因。5)影像诊断难度大,需要诊断医生有雄厚的基础知识和丰富的阅片经验,不断涌现的新技术新设备对影像诊断教学提出了更高的要求。医学院校传统的教学手段和教学设备远远满足不了不断扩大的招生规模的需求；6)影像设备产生的海量图像资料需要长期保存,国内医院普遍缺乏远程容灾和备份的措施,一旦发生火灾、地震、海啸等自然灾害,可能导致资料完全丢失,造成不可弥补的损失。
     通过网络技术实现区域内医疗资源的共享与医疗过程的协同,是均衡医疗资源、解决“看病难、看病贵”问题的重要手段。远程影像协作诊断具有临床价值高、诊断难度大、基层医院迫切需要、DICOM标准稳定成熟、通过共享与协作可大幅度降低医疗费用等特点,是区域医疗协作中最具临床价值的应用。因此,构建区域化的医学影像服务平台,开展医学影像远程会诊、影像转诊、虚拟影像专科、远程教学、远程灾备、影像代存、典型病例查询、图像内容检索等服务,实现区域内影像设备及影像诊断专家的充分共享和高效协作,对于均衡医疗资源、提高基层医院诊疗水平、提高影像设备的使用效率、提高医疗服务质量、降低医疗费用具有重要的意义。
     构建区域医学影像服务平台,开展远程影像协作应用是一项庞大的系统工程,采用传统建设全院PACS的技术手段构建大规模的区域医学影像服务平台面临着巨大挑战：
     1)建设费用高。PACS医学影像的数据量远远大于HIS、LIS等其它医疗系统的数据量,一个大型三甲医院每年PACS图片数据量高达数TB到数十TB。区域内的医学影像数据量将达到PB(1024TB)以上级别。区域医学影像服务平台需要提供远程灾备和影像代存等服务,因此需要考虑区域内的全部影像数据量。采用传统FC SAN(光纤存储区域网络)构建PB级容量的存储系统,建设费用极高；
     2)性能和扩展能力不足。即便是性能和稳定性最好的FC SAN,其传输带宽和处理能力也难以满足PB级海量数据的处理和传输要求。同时,增加存储设备时,整个应用系统的目录结构一致性难以保证。目前市场上虽然已出现存储虚拟化产品,可以将多台存储设备虚拟化成一个统一的存储池,解决存储架构的一致性和动态扩展问题,但是出于市场考虑和技术限制,厂商一般都只支持自有存储产品的虚拟化,难以实现不同厂商设备的兼容；
     3)可用性受限。全院PACS常用“在线—近线—离线”三级存储模式。最近的在线图像数据存放在性能高的FC SAN中,稍久一点的近线图像存放在性能稍差的IP SAN或NAS存储设备中,超过一定时限的图像则离线存储到光盘库或磁带库中。这种方式的好处是可以节省成本,保证医疗诊断应用的性能,但是整个系统的可用性受到限制,离线图像数据难以实时获取；
     4)缺乏一体化的应用软件。目前构建区域PACS系统在技术上大多是采用全院PACS系统的架构,但是这种架构只适合高速、稳定、安全的园区网络环境。在带宽受限、稳定性差、受防火墙阻断的公网环境下,难以满足应用需求。另外,区域医学影像协作中最重要的应用——医学影像远程会诊,目前还基本采用“点对点”的模式,缺乏一体化、跨平台、高可用的医学影像管理与协作应用软件。
     随着云计算技术和应用模式的快速兴起,为构建低成本、高可用、高性能、易扩展的区域医学影像服务平台提供了一条有效的途径。我们承担的课题就是研究通过高速城域网、医保专网、电子政务外网、互联网等传输介质,采用云计算技术构建区域医学影像云服务平台,为区域内的各类医疗机构和人员提供SaaS模式的医学影像远程应用服务。而高性能、高可靠、易扩展的海量医学图像分布式存储架构和并行处理技术将是医学影像云服务平台的基础和关键,也是本论文的研究重点。
     Google作为全球最大的搜索引擎和云计算服务商,率先遇到了PB级海量数据的处理问题。她没有采用传统的存储和高性能计算技术,而是独辟蹊径地创造了GFS分布式文件系统和MapReduce分布式计算技术,通过聚合数以万计普通服务器的存储和计算资源,实现了超大规模数据集的高效处理,取得了巨大的成功。Apache Hadoop项目则是GFS和MapReduce的开源实现,目前已成为世界上最有影响力的开源云计算平台,取得了广泛的应用。针对Hadoop平台的特点和医学影像云服务平台的需求,我们设计了一种HDFS和FC SAN相结合的“在线—归档”二级存储架构HMISA(Hybrid Medical Image Storage Architecture),取代区域PACS系统常见的“在线—近线—离线”三级存储架构。并在其基础上开展了基于MapReduce框架的医学影像后处理等分布式计算应用。
     HDFS分布式文件系统具有如下特点：1)专门针对PB级以上海量数据的快速存储和处理而设计,已在Yahoo、FaceBook、亚马逊、百度、淘宝等海量数据处理应用平台上得到了广泛验证；2)系统可扩展性高,只需简单添加服务器数量,即可实现存储容量、磁盘IO吞吐率、传输带宽和计算能力的线性增长,并保持一致的文件目录结构；3)数据冗余度高,缺省每份数据在3个不同的节点上保留副本；4)适合“流式”访问(Streaming access),即一次写入,多次读取,数据写入后极少修改,适合医学影像文件的访问特点；5)除了数据存储能力外,与HDFS共生的MapReduce分布式计算框架还可充分利用各服务器CPU的计算资源,便于后期开展基于海量医学影像数据的图像预处理、格式转换、图像融合、内容检索、三维重建等数据密集型应用。
     但是,Hadoop在构建医学影像存储系统时还存在以下问题：1) Hadoop的设计理念是针对大文件进行优化的,其默认的数据块大小为64 MB,而医学影像资料中常见的CT、MRI的图像大小大多为512 KB左右,一次拍摄产生的图像数量大约为100～200幅,如果直接将大量的小文件存储在HDFS文件系统中,过多的元数据将导致HDFS主节点NameNode的内存消耗过大,降低集群的性能。2)HDFS的设计理念不适合需要低时延的实时应用,其写入性能大大低于读取性能,不太适合需要快速获取图像资料并撰写诊断报告的PACS实时应用。
     针对Hadoop平台不适合存储医学影像小文件的问题,我们采用Hadoop的SequenceFile文件格式,设计了一种适合HDFS特点的S-DICOM序列化医学影像文件格式,通过Key/Value键值对的形式,将一个病人一次检查产生的所有图像合并成一个序列化文件。这样可以大大提高HDFS处理的性能,防止元数据服务器(NameNode)内存消耗过大的问题。同时,Key/Value形式的数据也是MapReduce分布式计算平台的最佳输入数据结构,便于后期开展基于医学影像文件的数据密集型应用。
     单纯的HDFS分布式文件系统不适合实时应用,但是具备低成本、易扩展、高性能、高可靠的特点。传统的集中存储(FC SAN)则非常适合小文件的快速读写。因此,结合两者的优点我们设计了一套FC SAN和HDFS结合的混合式存储架构HMISA,将常见的PACS“在线—近线—离线”三级存储简化为“在线—归档”两级存储架构。一年以内的医学影像资料以DICOM原始格式保存在FC SAN一级“在线库”中,可满足PACS阅片和撰写诊断报告等实时应用的低时延要求。超过一年的图像则转换成S-DICOM格式保存到HDFS二级“归档库”中,通过SDFO(S-DICOM File Operator)文件访问组件,屏蔽底层图像读写操作的细节,为上层的SaaS模式医学影像应用系统和DICOM应用组件提供统一的图像查询、读取和写入接口。
     Hadoop内置的MapReduce分布式计算框架为开发人员屏蔽了任务调度、节点容错、节点通讯、负载均衡等并行计算中难以处理的细节,大大降低了分布式计算系统的开发难度。同时,MapReduce采用了“将计算移动到数据所在位置”的设计理念,特别适合海量医学影像的数据密集型分布式处理。我们在分布式存储架构的基础上编写了基于MapReduce框架的医学影像分布式处理程序,包括DICOM图像批量转换为JPEG格式、病人隐私信息批量清除、批量生成缩略图、网络访问日志的分布式导入和查询等。并在测试集群中验证了分布式计算的性能及部分参数对性能的影响。测试结果表明Hadoop集群可以有效利用各存储节点的计算能力,集群的性能远远高于单机处理的能力,并且通过水平扩展(Scale-out)的方式可以快速实现存储容量和处理速度的线性增长。
     综上所述,本论文的特色和创新主要包括：
     1)分析了区域医学影像共享与协作的需求、技术进展及面临的主要问题,设计了区域医学影像云服务平台的整体技术架构,包括逻辑架构、网络架构、存储架构和软件架构等。
     2)设计了一种FC SAN和HDFS相结合的医学影像“在线—归档”二级存储架构HMISA(Hybrid Medical Image Storage Architecture),解决区域PACS常见的“在线—近线—离线”三级存储架构的性能、可扩展性和可用性等问题。设计了S-DICOM医学影像归档文件格式,解决HDFS不适合存储和处理大量小文件的问题。开发了一套SDFO文件访问组件,屏蔽HMISA存储架构底层图像读写操作的细节,为上层的SaaS模式医学影像应用系统和DICOM应用组件提供统一的图像查询、读取和写入接口。
     3)基于MapReduce框架设计开发了DICOM图像转换JPEG格式、病人隐私信息清楚(De-identification)、批量生成缩略图等医学影像分布式数据处理程序,并在Hadoop集群上作了相关的性能测试。测试结果表明Hadoop集群可以轻松突破单台服务器的性能极限,满足区域海量医学影像数据的快速存取和处理需求。
Thanks to the development of medical imaging, recent decade witnessed the rapid progress of medical imageology. Advanced techniques and equipments such as 320-slice spiral CT, ultra-high field MRI, molecular imaging, functional imaging, multi-modality image fusion brought convenience to doctors meanwhile led to some significant problems:
     1) Expensive medical devices in many hospitals were regarded as the reflection of hospitals' scale and medical technology; however, they contributed to high medical expenditure as well.
     2) Given patients' film which carried limited information could not serve as the basis of diagnosis when they transferred to another hospital, repeated checking resulted in patients'extra medical expense.
     3) Though there were advanced image devices such as 64-slice spiral CT in some rural hospitals, lack of specialized talents and low utilization rate of the device made the advantages of these advanced devices not in full display.
     4) Deficiency of funds, equipments, techniques and talents in basic medical and health institutions generated the overcrowding of large-scaled hospitals. The imbalance of medical recourses was partly responsible for the difficulty of medical treatment and expensive medical expense.
     5) The difficulty existing in image diagnosis required profound knowledge and rich experience of doctors. Increasingly newly-developed devices presented even greater challenges to the teaching image diagnosis. Traditional teaching methods and devices could not satisfy the swelling need of student's recruitment.
     6) Large image data need to be preserved in long-term, while domestic hospitals were in general short of remote disaster tolerant system and backup measures, potential natural disasters such as fire, earthquake or tsunami would cause the total loss of these images, which would in turn lead to irreparable consequences.
     The sharing of regional medical resources and coordination of medical process through network functioned as a remarkable means of overcoming the difficulty of medical treatment and expensive medical expense. Regional medical imaging coordination was an significant part of regional healthcare and the establishment of regional imaging services platform could benefit tele-radiology, virtual radiology department, remote teaching, remote redundancy, personal medical record hosting, typical case query, content-based image retrieval, etc., which was of great importance for the balance of medical recourses, improvement of diagnosis accuracy in basic medical institutes and lower medical expense.
     The application of traditional PACS in the construction of large-scaled regional medical images center was confronted with great economic and technical challenges.
     1) High expenditure. There were much more data in PACS than those in HIS or LIS. The volume of image data in PACS in a large-scaled first-class third-level hospital would be on the order of terabytes or dozens of terabytes each year and medical image data in a certain region would be on the order of hundreds of terabytes to even several petabytes. It is essential to cover all the images in certain region since regional medical imaging services platform provides remote redundancy, image hosting services, etc. However, the application of traditional FC SAN in the construction of large-scaled storage system would lead to high expenditure.
     2) Limited performance and scalability. The throughput and processing capacity of FC SAN could not satisfy the need of large data processing and transfer. Even the increase of bandwidth would be achieved through more channels available, the architecture of which was complicated and expensive. Meanwhile, it was difficult to maintain the consistency of directory structure of the overall system with extended storage devices. Though virtual storage products existed in the market, which virtualized many storage devices into a storage pool, most of them were not compatible with products from other companies due to its technology limitation.
     3) Limited availability. The commonly-employed storage model of "online, near-line, offline" in PACS could save the expense and guarantee diagnosis performance but it was at the expense of limited availability since it was difficult to obtain offline images.
     4) Absence of systematic software. The present full-PACS software is suitable for high-speed, stable and safe campus network. Provided in pure internet access with limited bandwidth, poor stability and firewall blocking system, such kind of software is difficult to meet application need.
     The rapidly developing cloud computing technologies turned out to be an effective approach to the construction of cost-efficient, high-performance, flexible and resilient regional medical imaging services platform. The presented project describes the construction of regional medical imaging services platform with cloud computing through various transfer media to provide services for SaaS-based tele-applications of medical imaging. The key point of the research goes to high-performance, reliable, scalable distributed storage architecture and parallel processing which were the basis of and key to regional medical imaging services platform.
     As a world-widely largest search engine and cloud computing provider, Google was the first confronted with how to process large volumes of data (petabytes). It created GFS and MapReduce instead of applying traditional data store architecture and high performance computing. Through the accumulation of storing and computing capacity of thousands of common servers, it realized the efficient processing of large volumes of data, which proved to be a big success. Apache Hadoop is an open source implementation of Google GFS and MapReduce, which turned to be the most influential open source cloud computing platform and in wide application. Taken the characteristics of Hadoop and the requirement of medical imaging cloud platform into consideration, the basic structure of medical imaging cloud computing platform was designed with Hadoop HDFS and MapReduce
     The benefits of Hadoop HDFS were presented as follows:1) It was particularly designed for rapid large volumes of data (petabytes) storing and processing, which was testified by Yahoo, FaceBook, Amzon, Baidu, Taobao, etc.2) High scalability. The increase of servers could realize the linear increase of storage capacity, disc I/O throughput rate and calculating capacity.3) High redundancy. Each data block will be stored in three different nodes.4) It is designed for streaming access, which is suitable for the long-term storage of medical images.5) Besides information storage capacity, MapReduce, coexisting with HDFS, could aggregate CPU power across the nodes for data-intensive applications such as image preprocessing, format converting, image fusion, content-based image retrieval (CBIR), three dimensional reconstruction.
     Some problems existed in the construction of medical image storage system with Hadoop. Hadoop was designed for large files with default block size of 64MB, by contrast, frequently-seen medial images such as CT, MRI were about 500 KB each and there were about 100 to 200 images each scan. Suppose a sea of small files were stored in HDFS, excessive meta data would contribute to too much RAM consumption of NameNode and eventually decreased the performance of whole cluster. In addition, HDFS was not appropriate for the real-time application requiring low latency since its writing performed more weakly than its reading did, which was not suitable for the real-time application of PACS.
     To solve this problem, with Sequence File in Hadoop, a S-DICOM was specifically designed. Since with Key/Value pair, all images from one patient within a study can be combined into one serialized file, thus, the process performance was improved significantly and RAM of NameNode had no risk of large consumption. Data with Key/Value pair served as the best input data format which was beneficial to medical-image-based data intensive applications. Though HDFS itself was not appropriate for real-time application, it was economical, highly scalable, reliable and with high performance. Since traditional FC SAN was suitable for quick access to small files, a kind of "online, archived" two-level model medical image storage architecture combined with the benefits of both FC SAN and HDFS, which was called HMISA(Hybrid Medical Image Storage Architecture) was specifically designed. Medical images produced within one year stored in FC SAN with original format, which could fulfill the requirement of low latency of reading film and writing diagnosis reports. Those produced over one year would be converted into S-DICOM format hosting in HDFS. With components of S-DICOM File Operator(SDFO), reading and writing of images in substructure were shielded, which provided interface to query, reading and writing for upper layer applications and components.
     Distributed computing framework of MapReduce built in Hadoop had shielded some complicated details such as job scheduling, fault tolerance, load balance, which reduced the difficulty existing in the development of distributed computing system. Meanwhile, with the strategy of moving computation to the data, MapReduce achieved high data locality which in turn resulted in high performance and it was the best choice for distributed processing of data-intensive application. We designed some MapReduce-based distributed computing program for medical image processing such as converting file format from DICOM to JPEG, batch de-identification, batch thumbnail generation, distributed importing and query of access logs, and its influence on performance was tested in testing cluster, the result of which indicated that with MapReduce cluster the calculation capacity of each storage node was in full display and the performance of large data processing was improved greatly with scale-out.
     To sum up, with traditional technology, the construction of regional medial imaging services platform was confronted with great challenges such as funds, technology. Newly-developed cloud computing and service model was a practicable approach to construct an economical, reliable and scalable regional medical imaging services platform.

引文

[1]郑西川.区域医疗医学影像信息共享方案进展与面临的挑战[J].中国医疗器械信息,2009,15(10)：57-61.
    [2]全宇,佡剑非,郭启勇.构建区域协同医疗平台的探讨[J].中国医院管理,2009,29(6)：54-56.
    [3]江捍平.区域卫生信息化建设规划[M].北京：人民卫生出版社,2005.
    [4]HIMSS RHIO/HIE[EB/OL]. http://www.himss.org/ASP/topics_rhio.asp, 2011-03-15.
    [5]White House Website. Transforming Health Care:the president's health information technology plan [EB/OL]. http://www.starcareonline.com/Transform_Healthcare_WhiteHousePaper.doc,2011-03-10.
    [6]Kruit D,Cooper PA. SHINE:Strategic Health Informatics Networks for Europe[J]. Comput Methods Programs Biomed,1994,45(1):155-8.
    [7]陈敏.构建区域协同医疗服务信息平台[J].中国医院院长,2009(13)：56-7.
    [8]陈新,孙中海.区域医疗服务平台网络架构设计[J].医学信息,2009,22(11)：2281-3.
    [9]AHA. Fast Facts on US Hospitals [EB/OL]. http://www.aha.org/aha/content/2008 /pdf/fast_facts_2008.pdf,2008-11-07.
    [10]National Electrical Manufactures Association. Digital Imaging and Communications in Medicine (DICOM) [S],1999
    [11]Health Level Seven International. Health Level Seven Standards [EB/OL]. http://www.hl7.org,2011-03-10.
    [12]杨小燕.基于IHE的医学影像协作网的构建研究[D].广州：南方医科大学,2009.
    [13]Stasiu R.K., Bichinho G. L. Components proposal for medical images and HIS [A]. Computer-Based Medical Systems,2001(CBMS'2001).14th IEEE SymPosiom,2001:26-7.
    [14]农智红,刘郦,覃丽群.远程医疗技术演变过程及其发展趋势浅析[J].广西民族大学学报(自然科学版),2007,13(2)：39-43.
    [15]母晓莉.区域医疗影像数据中心系统架构设计研究[J].生物医学工程学进展,2008,29(2)：71-5.
    [16]叶欣.区域PACS厦门模式[J].中国医院院长,2009,(22)：62-67.
    [17]柏志安,朱立峰,孙辅,等.医院集团内医学影像检查协同服务模式和实现[J].中国数字医学,2010,5(6)：27-29.
    [18]刘谦.区域性医学影像信息系统的实现策略[J].中国医疗器械信息,2008,14(5)：36-8
    [19]刘谦.一个省级区域性医学影像信息系统的设计与实现[J].数理医药学杂志,2008,21(6)：737-9
    [20]今日南海.南海7家医院互通医学影像[EB/OL]. http://www.citygf.com /FSNews/FS_002003/FS_002003002/201101/t20110111_1089791.html, 2011-01-11.
    [21]缪競陶.区域影像医疗建设：PACS之外,还需要什么[J].e医疗,2010,(12)：39-40.
    [22]詹海燕.我区区域性PACS系统不断规范化[EB/OL]. http://pyrb.dayoo.com /html/2009-03/04/content 491075.htm.2009-03-04.
    [23]Wikipedia. Cloud computing [EB/OL]. http://en.wikipedia.org/wiki/Cloud_computing,2011-03-02.
    [24]Amazon. Amazon EC2 Getting Started Guide[EB/OL]. http://docs.amazon webservices.com/AmazonEC2/gsg/2006-06-26/introduction.html,2006-06-26.
    [25]Chen Kang, Zheng Weimin. Cloud computing:System instances and current research[J]. Journal of Software,2009,20(5):1337-1348.
    [26]VMware. Begin the Journey to a Private Cloud with Datacenter Virtualization [EB/OL]. http://www.vmware.com/products/vsphere/overview.html,2011-03-10.
    [27]Citrix. Citrix XenServer[EB/OL]. http://www.citrix.com/English/ps2/products /feature.asp?contentID=2300351,2011-03-10.
    [28]Microsoft. Microsoft Hyper-v Server Home Page[EB/OL]. http://www. microsoft.com/hyper-v-server,2011-03-10.
    [29]Red Hat. KVM-Kernel Based Virtual Machine[EB/OL]. http://www.redhat.com /f/pdf/rhev/DOC-KVM.pdf,2011-03.10.
    [30]Mark Turner, David Budgen, Pearl Brereton. Turning Software into a Service[J]. Computer,2003,36(10):38-44.
    [31]Laplante P. A., Jia Zhang, Voas J., et al. What's in a Name? Distinguishing between SaaS and SOA[J]. IT Professional,2008,10(3):46-50.
    [32]Citrix. Citrix XenApp[EB/OL]. http://www.citrix.com/English/ps2/products /product.asp?contentID=186,2011-03-10.
    [33]Citrix. Citrix XenDesktop[EB/OL]. http://www.citrix.com/virtualization/desktop /xendesktop.html,2011-03-10.
    [34]Luis M. Vaquero, Luis Rodero-Merino, Juan Caceres, et al. A break in the Clouds:Towards a Cloud Definition[J]. ACM SIGCOMM Computer Communication Review. Jan.2009,39(1):50-55.
    [35]Michael Armbrust,Armando Fox,Rean Griffith, et al.Above the clouds:a berkeley view of cloud computing. http://www.eecs.berkeley.edu/Pubs/TechRpts /2009/EECS-2009-28.html.
    [36]IBM. IBM Introduces Ready-to-Use Cloud Computing[EB/OL]. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss,2007-11-15.
    [37]IBM. IBM and European Union Launch Joint Research Initiative for Cloud Computing[EB/OL]. http://www-03.ibm.com/press/us/en/pressrelease/23448.wss, 2008-02-05.
    [38]Ajay Anand. Scaling Hadoop to 4000 nodes at Yahoo[EB/OL]. http://developer.yahoo.com/blogs/hadoop/posts/2008/09/scaling_hadoop_to_400 0 nodes a,2008-09-30.
    [39]Microsoft. Windows Azure [EB/OL]. http://www.microsoft.com/windowsazure, 2011-03-10.
    [40]Isard M, Budiu M, Yu Y, et al. Dryad:Distributed data parallel programs from sequential building blocks[A]. Proc of the 2nd European Conf on Computer Systems (EuroSys)[C]. New York:ACM,2007:59-72.
    [41]刘鹏.云计算[M].电子工业出版社,北京：高等教育出版社,2010.
    [42]张敏,陈云海.电信运营商云计算发展及应用研究[J].信息通信技术,2010(2)：46-51.
    [43]云铮.淘宝云梯分布式计算平台整体架构[EB/OL].http：//www.tbdata.org/wp-content/uploads/2010/12/velocitychina2010Zhangqing.pdf,2010-12-13.
    [44]Tom W. Hadoop:The Definitive Guide [M]. USA:O'Reilly Media Inc.,2009: 41-2.
    [45]陈凯,白英彩.网络存储技术及发展趋势[J].电子学报,2002,30(z1)：1928-32.
    [46]D. Nagle, G. Ganger, J. Butler, et al. Network Support for Network-Attached Storage[A]. Proceedings of the Hot Interconnects'1999[C]. Boston,August 1999: 78-83.
    [47]G. A. Gibson, R. V. Meter. Network Attached Storage Architecture [J]. Communications of the ACM,2002, Vol 43(11):37-45.
    [48]P. N. Yianilos, S. Sobti. The Evolving Field of Distributed Storage [J]. IEEE Internet Computing,2001,5(5):35-9.
    [49]Hemandez R, Kion C, Cole G. IP Storage Networking:IBM NAS and iSCSI Solutions [Z]. Redbooks Publications(IBM), SG24-6240-00,2001.
    [50]石永革,谢才炳,石峰.iSCSI协议性能分析与优化[J].计算机工程与设计,2009,30(4)：915-7.
    [51]Liu Qian, Liu Jie. The Development of FCoE[J]. Computing IT Week,2007, (37):49-51.
    [52]George Coulouris, Jean Dollimore, Tim Kingberg. Distributed Systems:Concepts and Design, Third Edition[M], Addison Wesley,2003.
    [53]周敏.淘宝分布式数据处理实践[EB/OL]. http://www.tbdata.org/wp-content /uploads/2010/09/hadoop-china-2010-zhoumin.pptx,2010-09-10.
    [54]J. H. Howard. An Overview of the Andrew File System. In Proceedings of the USENIX Winter Technical Conference[C]. Dallas, TX, USA,1998.
    [55]Satyanarayanan M. Integrating Security in a Large Distributed System[J]. ACM Transactions on Computer System,1989,7(3):247-280.
    [56]M.Satyanarayanan, J. J. Kistler, P. Kumar. Coda:a Highly Available File System for a Distributed Workstation Environment [J]. IEEE Transactions on Computers, 1990,39(4):447-459.
    [57]J. J. Kistler, M. Satyanarayanan. Disconnected operation in the coda filesystem. ACM Transactions on Computer Systems,10(1):3-25,February 1992.
    [58]SNIA CIFS Documentation Work Group. CIFS Protocol version CIFS-Spec 0.9 [Z].2001.
    [59]Andy Watson. Multiprotocol Data Access:NFS, CIFS, and HTTP (TR-3014). Network Appliance Technical Report:NA-96-2534.1996:32-45.
    [60]Kenneth W. Preslan, Andrew P. Barry, Jonathan E. Brassow, et al. A 64-bit, Shared Disk File System for Linux [A]. In Proceeding of 16th IEEE Mess Storage Systems Symposium[C]. San Diego,CA,USA:IEEE,1999.22-41.
    [61]Oracle. OCFS2-Oracle Cluster File System for Linux[EB/OL]. http://www.oracle.com/us/technologies/linux/025995.htm,2011-03-10.
    [62]VMware. VMware vStorage VMFS[EB/OL]. http://www.vmware.com/files/pdf /VMware-vStorage-VMFS-DS-EN.pdf,2011-3-10.
    [63]P. J. Braam. The Lustre Storage Architecture[EB/OL]. http://www.lustre.org/docs /lustre.pdf:21-45,2011-3-10.
    [64]Clluster File System. Lustre:A Scalable, High-Performance File System[EB/OL]. http://www.lustre.org/docs/whitepaper.pdf.
    [65]Ligon, III W.B., Ross R. B.. PVFS:Parallel Virtual File System, Beowulf Cluster Computing with Linux[M]. Thomas Sterling, editor. MIT Press. November, 2001:391-430.
    [66]吴一波.并行文件系统负载均衡技术的研究与实现[D].长沙：国防科学技术大学,2009.
    [67]Gluster. Cloud Storage Architecture for the Modern Data Center [EB/OL]. http://www.gluster.com/products/performance-in-a-gluster-system-white-paper, 2011-3-10.
    [68]Luiz Andre Barroso, Jeffrey Dean, Urs Holzle. Web search for a planet:The Google cluster architecture [J]. IEEE Micro. Mar/Apr.2003,23(2):22-28.
    [69]Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. The Google File System: 19th ACM Symposium on Operating Systems Principles[C], Lake George, NY, October,2003.
    [70]Sunaga H., Hoshiai T.l, et al. Technical trends in P2P-based communications[J]. IEICE Transactions on Communications,2004, E87-B(10):2831-2846.
    [71]蔡荣杰.基于对等网络的高性能医学影像服务系统关键技术研究[D].广州：南方医科大学,2008.
    [72]Sean Rhea, Patrick Eaton, Dennis Geels. Pond:The OceanStore Prototype. File and Storage technologies (FAST'02)[C]. CA, USA. January 28,2002:43-59.
    [73]Kubiat Owicz,David Bindel,Yan Chen, et al. OceanStore:An Architecture for Global-Scale Persistent Storage[a]. In:Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS 2000)[c].2000,190-201.
    [74]Bolosky W, Douceur J, Ely D, et al. Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs[C]. In Proceedings of ACM SIGMENTRICS'2000,2000.
    [75]A. Rowstron, P. Druschel. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility [A]. In:Proceedings of the 18th ACM Symposium on Operating Systems Principles[C]. Banff,2001,188-201.
    [76]Stoica, Robert Morfis, David Karger, et al. Chord:A scalable peer-to-peer lookup service for internet applications [A]. In:Proceedings of the ACM SIGCOMM'01 Conference[C]. San Diego, California,2001,149-160.
    [77]郭朝阳,代亚非,韩华.燕星系统的设计及其实现中的技术问题[J].计算机工程与应用,2003(9)：146-8.
    [78]杨德志,黄华,张建刚,等.大容量、高性能、高扩展能力的蓝鲸分布式文件系统[J].计算机研究与发展,2005,42(6)：1028-1033.
    [79]Ken Dunham. The Problem with P2P[J]. EDPACS:The EDP Audit, Control, and Security Newsletter,2006,33(11):9-13.
    [80]Fran berman, Geoffrey Fox, Tony Hey. The grid:past, present, and future [A]. Grid Computing:Making the Global Infrastructure a Reality[C]. John Wiley &
    Sons, Ltd.2003,9-50.
    [81]熊劲.大规模机群文件系统的关键技术研究[D].长沙：国防科技大学2006.
    [82]Globus. MEDICUS[EB/OL]. http://dev.globus.org/wiki/Incubator/MEDICUS, 2011-03-10.
    [83]IBM. National Digital Medical Archive, Inc. provides tools to enable sharing of digital medical images[EB/OL]. ftp://ftp.software.ibm.com/software/solutions /pdfs/ODB-0148-02.pdf,2011-03-10.
    [84]Ajay Anand. Apache Hadoop Wins Terabyte Sort Benchmark[EB/OL]. http://developer.yahoo.com/blogs/hadoop/posts/2008/07/apache_hadoop_wins_te rabyte_sort_benchmark,2008-07-02.
    [85]The Apache Software Foundation. Apache HDFS Architecture [EB/OL]. http://hadoop.apache.org/common/docs/current/hdfsdesign.html,2011-03-10.
    [86]The Apache Software Foundation. Apache MapReduce Architecture[EB/OL]. http://hadoop.apache.org/common/docs/cuiTent/mapreduce_design.html, 2011-03-10.
    [87]Thusoo A., Sarma J., Jain N., et al. Hive-A warehousing solution over map-reduce framework[A]. Proc of the 35th Int Conf on Very Large Data Bases (VL DB). New York:ACM,2009:1626-9.
    [88]T. M. Lehmann, M. O. Guld, C. Thies, et al. Content-based image retrieval in medical applications[J]. Methods Inf Med,2004,43:354-361.
    [89]T. M. Lehmanna, M. O. Guld, T. Deselaersb, et al. Automatic categorization of medical images for content-based retrieval and data mining[J]. Computerized Medical Imaging and Graphics,2005,29:143-155.
    [90]E. A. Firle, S. Wesarg, C. Dold. Fast CT/PET registration based on partial volume matching[A]. In:Lemke HU ed. Computer Asisted Radiology and Surgery. Proceedings,CARS 2004[C]. Amsterdam:Elsevier,2004:31-36.
    [91]谈泉,林家瑞.多模态医学图像融合技术的研究与进展[J].国际生物医学工程杂志,2006,29(3)：158-160.
    [92]J. Morriss, J. Moreland, H. Burkhart. Pre-surgical evaluation of interrupted aortic arch with 3-dimensional reconstruction of CT images[J]. Ann Thorac Surg,84(1).
    [93]郑西川,胡彬,吴允真,等.国际医学影像共享案例与区域医疗信息交换平台建设探讨[J].中国医疗器械信息,2010,16(3)：28-32.
    [94]Sage Weil, Kristal Pollack, Scott A, et al. Dynamic Metadata Management for Petabyte-Scale File Systems [A]. Proceedings of the 2004 ACM/IEEE Conference on Supercomputing(SC'04), November 2004.
    [95]Keith A. Smith, Margo I. Seltzer. File System Aging-Increasing the Relevance of File System Benchmarks[A]. In:Proceedings of the 1997 ACM SIGMETRICS International Conference On Measurement and Modeling of Computer Systems[C],1997,203-213.
    [96]S. A. Brandt, E. L. Miller, D. E. Long, et al. Efficient Metadata Management in Large Distributed Storage Systems, In:the 17th International Parallel and Distributed Processing Symposium(IPDPS 2003), April 2003.
    [97]Sage A. Weil, Kristal T, Pollack, et al. Miller:Dynamic Metadata Management for Petabyte-scale File Systems, University of Califorma, Santa CrUZ.
    [98]王立功,刘伟强,于甬华.DICOM医学图像文件格式解析与应用研究[J].计算机工程与应用,2006,29：210-212.
    [99]Hagit Attiya, Jennifer Welch. Distributed Computing:Fundamentals, Simulations, and Advanced Topics[M]. US:McGraw Hill,1998.
    [100]Patterson D. Technical perspective:The data center is the computer[J]. Communications of the ACM,2008,51(1).
    [101]Newman H, Ellisman M, Orcutt J. Data-intensive e-science frontier research [J]. Communications of the ACM,2003,46(11):68-77.
    [102]Jimmy Lin, Chris Dyer. Data-Intensive Text Processing with MapReduce[M]. USA:Morgan and Claypool Publishers,2010.
    [103]Randal E. Bryant. Data-Intensive supercomputing:the case for DISC [R]. CMU Technical Report CMU-CS-07-128, USA:Department of Computer Science, Carnegie Mellon University.2007.
    [104]王鹏,孟丹,詹剑锋,等.数据密集型计算编程模型研究进展[J].计算机研究与发展,2010,47(11)：1993-2001.
    [105]Ananth Grama, George Karypis, Vipin Kumar, et al. Introduction to parallel computing [M]. USA:Addison Wesley.2003.
    [106]Baoqiang Yan, Philip J. Rhodes. Toward automatic parallelization of spatial computation for computing clusters [A]. Proceedings of the 17th international symposium on High performance distributed computing[C].2008,45-54.
    [107]Colbyranger, Ramanan Raghuraman, Arun Penmetsa. Evaluating MapReduce for Multi-core andMultiprocessor Systems [A]. Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture[C]. 2007,13-24.
    [108]Niblack W, Barber R, Equitz W, et al. The QBIC-project-querying images by content using color,texture and shape. Proceedings SPIE 1908:173-187,1993.
    [109]K. Porkaew, S. Mehrotra, M. Ortega. Query reformulation for content based multimedia retrieval in MARS [A]. In:IEEE Int'I Conf. Multimedia Computing and Systems[C],1999.
    [110]Pentland A, Picard R, Sclaroff S. Photobook-tools for content-based manipulation of image data-bases. Proceedings SPIE 2185:34-47,1994.
    [111]Smith J R, Chang S F. VisualSEEK:A Fully Automated Content-based Image Query System[A]. In:Proc. of ACM Multimedia'96 [C]. New York,1996.
    [112]徐烈英.CBIR技术在医学影像数据库的设计与研究[J].微计算机信息,2009,25(1-3)：189-191.
    [113]D. Korenblum, D. Rubin, S. Napel, et al. Managing Biomedical Image Metadata for Search and Retrieval of Similar Images[J]. Journal of Digital Imaging,2010(9).
    [114]Jeffrey Dean, Sanjay Ghemawat. MapReduce:Simplified Data Processing on Large Clusters[R]. OSDI'04:Sixth Symposium on Operating System Design and Implementation 2004.
    [115]郑启龙,王昊,吴晓伟,等.HPMR：多核集群上的高性能计算支撑平台[J].微电子学与计算机,2008,25(9)：21-4.
    [116]Andrew J. Pages, Thomas M. keane, Thomas J.Naughton. Scheduling in a dynamic heterogeneous distributed system using estimation error[J]. Journal of Parallel and Distributed Computing.2008,68(11):1452-1462.
    [117]Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, et al. Job Scheduling for Multi-User MapReduce Clusters[R]. Technical Report UCB/EECS-2009-55, EECS Department, University of California, Berkeley, Apr 2009.
    [118]夏祎.Hadoop平台下作为调度算法研究与改进[D].广州：华南理工大学,2010.
    [119]陈全.异构环境下Map-Reduce调度算法的研究[D].上海：上海交通大学,2009.
    [120]The Apache Software Foundation. Fair Scheduler for Hadoop [EB/OL]. http://hadoop.apache.org/common/docs/current/Fair_scheduler.html,2011-03-20.
    [121]The Apache Software Foundation. Capacity Scheduler for Hadoop [EB/OL]. http://hadoop.apache.org/common/docs/current/Capacity_scheduler.html, 2011-03-20.
    [122]R.buyya, Chee Shin Yeo, Srikumar Venugopal, et al. Cloud Computing and Emerging IT Platforms:Vision, Hype, and Reality for Delivering Computing as the 5th Utility [J]. Future Generation Computer Systems.2008(12).
    [123]J. Polo, D. Carrera, Y. Becerra, et al. Performance-driven task co-scheduling for mapreduce environments [A]. In:Network Operations and Management Symposium (NOMS),2010 IEEE,2010,373-380.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700