锥束CT三维重建算法加速技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
采用平板探测器的锥束CT系统扫描一圈可以得到多层的投影数据,相对于二维平行、扇束CT,具有扫描时间短、空间分辨率高、射线利用效率高的显著特点,现已得到广泛的应用。Feldkamp、David和Kress提出的基于圆形扫描轨迹的实用近似重建算法(简称为FDK算法),目前是商用锥束CT机上最通用的算法。但是随着面阵探测器上探测单元数量越来越多,探测器扫描速度越来越快以及锥束CT重建算法的复杂性,使得三维图像重建的运算量和数据传输量越来越大,重建时间也越来越长,过去只利用CPU进行重建计算的方案已经不能满足现代工程应用的要求,因此研究如何提高锥束CT重建算法的运算速度并找到合适的方案具有重要的学术价值和应用研究价值。
     本文主要做了两个方面的研究,一是从重建算法的角度对锥束CT的图像重建加速理论进行研究;其二是研究利用图形处理器领域的统一计算设备架构技术来实现FDK算法的加速计算。
     在锥束CT重建算法的研究方面,本文对FDK算法进行了较为深入地研究,做了三个方面的工作。其一,对FDK算法的并行性原理进行分析,FDK算法的运算量大,但具有并行性,可以按转动分度和重建对象切片划分进行并行计算;其二,利用FPGA进行锥束CT图像重建,一直是工业CT领域的一个研究热点,本文根据FDK算法中的反投影计算过程,对反投影流水线计算架构进行了研究,发现该架构可以使反投影算法在低并行度条件下实现快速计算,在计算机上的仿真实验表明该架构在FPGA上是可以实现的;其三,研究了FDK算法中的反投影定点算法,并在计算机平台上进行了实验,实验结果表明定点算法相对于浮点算法的误差率小于1%。
     在硬件重建加速的应用研究方面,本文根据FDK算法的并行计算原理,提出了利用图形处理器领域中的统一计算设备架构技术来实现重建加速的方案。该方案采用了基于这种全新软硬件架构的图形显示卡,通过该架构特有的编程方式,利用图形处理器中的流处理器来进行FDK算法中的加权、滤波和反投影计算,实现了FDK算法的快速计算。实验结果表明,对于5123的单精度浮点数据格式的图像,在旋转一周为512个分度的条件下,重建时间可以缩短到一分钟以内,并且图形处理器显存与计算机内存之间传输时间小于1秒,与仅利用CPU的重建方法相比,该方案得到的重建加速比可达到250倍左右。
Cone-beam CT system can acquire multi-layer projection data in a single scanning circle by use of flat-panel detector system. Comparing with two-dimensional parallel and fan-beam CT, this system which now is widely applied, has the feature of short scanning time, high spatial resolution and efficient use of radiation. FDK algorithm which is proposed by Feldkamp, David and Kress based on circular scanning track is a practical reconstruction algorithm. At present, it is the most common algorithm in the commercial cone-beam CT system. However, with the number increasing of detector units and the scanning speed accelerating in the flat-panel system, as well as the complexity of three-dimensional reconstruction algorithm, the computation and the volume of data transmission in three-dimensional image reconstruction becomes huger and the algorithm is more and more time-consuming. The only use of CPU for image reconstruction calculation in past has been unable to meet the requirements of modern engineering applications. Now studying how to improve the computational speed and finding a suitable alternative method have an important value in application and academic research.
     In this paper, the content consists of two aspects. On the one hand, it is theory study of the cone-beam CT image reconstruction speed-up from the point of algorithm. On the other hand, it is the application of Compute Devices Unified Architecture (CUDA) in graphics processors to achieve FDK algorithm accelerating.
     In the study of reconstruction algorithm in cone-beam CT, this paper discusses three aspects based on the theory of FDK algorithm. First, this paper discusses the principle of parallel computation in the FDK algorithm. Although the computation is time-consuming, the algorithm can be calculated in parallel divided by the rotation angle or the sections in reconstructed object. Second, because the use of FPGA for cone-beam CT image reconstruction has been a hot spot in the field, so this paper discusses the back-projection calculation pipeline architecture which can cause the algorithm fast computation under the condition of small degree of parallel according to the back-projection calculation step. The simulation results in the computer show that the architecture can be constructed on FPGA. Third, the fixed-point back projection in FDK algorithm is discussed. Compared to the floating-point algorithm, the computer experiment results show that the relative error rate in fixed-point algorithm is less than 1%.
     In the applied research of the hardware reconstruction acceleration, this paper advances a speed-up method which uses CUDA in graphics processor (GPU) field. In this method, graphics card based on the new Hardware and software architecture is used. Through the new programming model in the architecture, the weighed, filtering and back-projection step is carried out by the Stream Processor Unit (SPU) in GPU, to achieve the FDK algorithm speed-up. The result shows that the image of 5123 volume in 512 rotation angles can be completed with 32bit floating-point in less than one minute, and the transmission time between the GPU and the computer memory is less than one second. This method gets a faster performance and good quality comparing with the method using CPU.
引文
[1] Mark W R, Glanville, et al. Cg: A System for Programmable Graphics Hardware in a C-like Language [J]. ACM Transactions on Graphics, 2003, 22(3): 806-907.
    [2] NVIDIA Corporation. NVIDIA CUDA Programming Guide 2.0 [EB/OL]. http://developer.download.nvidia.com/compute/cuda/2_0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf, 2008-07-06.
    [3]刘勇.使用GPU加速通用科学计算—CUDA技术解析[J].科技信息, 2008, 24:394-411.
    [4]张顺利.ART算法几种重建模型的研究和比较[J].航空计算技术,2005,35(2):39-41.
    [5] G.Wang, Y.Liu, T.H.Lin, P.C.Cheng. Half-Scan Cone-Beam X-Ray Microtomography Formula [J]. Scanning, 1994, 16(4):216-220.
    [6] Kirillov A A. On a problem of I.M.Gel’fand [J]. Sov Math Dokl, 1961, 2:268-269.
    [7] Smith B D, Image reconstruction from cone-beam projection necessary and sufficient condition and reconstruction methods [J]. IEEE Trans. Med.Imag, 1985, 4(3):14-28.
    [8] Alexander Katsevich. Theoretically exact filtered backprojection-type inversion algorithm for spiral CT [J].SIAM J.APPL.MATH. 2002, 62(6):2012–2026.
    [9] Feldkamp LA, Davis LC, Kress JW. Practical cone-beam algorithm [J]. Optical Society of America, 1984, 1(6): 612-619.
    [10] Sanz, Jorge L.C., Hinkle, Eric B. Computing projections of digital images in image processing pipeline architectures [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987, 2(2):198-207.
    [11] I.Agi, P.J.Hurst, K.W.Current. A pipelined IC architecture for radon transform computations in a multiprocessor array [J]. New York: IEEE Press, 1990, 5:442-451.
    [12] Agi.Iskender, Hurst, Paul J., Current.K.Wayne. An Image Processing IC for Backprojection and Spatial Histogramming in a Pipelined Array [J]. IEEE Solid State Circuits, 1993, 28(3):210-221.
    [13] Goddard.Iain, Trepanier.Marc. High-speed cone-beam reconstruction: An embedded systems approach[C]. Medical Imaging 2002 Visualization, Image-Guided Procedures and Display.2002.U.S.IEEE, 2002:483-491.
    [14] Klaus Mueller, Roni Yagel. Rapid 3D Cone-beam Reconstruction with Simultaneous Algebraic Reconstruction Technique (SART) Using 2D Texture Mapping Hardware [J]. IEEE Transactions on Medical Imaging, 2000, 19(12):1227-1237.
    [15] Xu Fang, Mueller K. Ultra-fast 3D filtered backprojection on commodity graphics hardware 66[J].IEEE International Symposium on Biomedical Imaging: Macro to Nano, 2004, 1:571- 574.
    [16] Xu Fang,Mueller K. Towards a unified framework for rapid 3D computed tomography on commodity GPUs [J].IEEE Nuclear Science Symposium Conference Record,2003,4:2757-2759.
    [17] Xu Fang, Mueller K. Accelerating popular tomographic reconstruction algorithms on commodity PC graphics hardware [J].IEEE Transaction on Nuclear Science, 2005, 52(3): 654-663.
    [18] Mueller K,Xu F. Practical considerations for GPU-accelerated CT [J]. IEEE International Symposium on Biomedical Imaging: Macro to Nano,2006: 1184-1187.
    [19]梁亮,张定华,毛海鹏,等.一种基于可编程图形硬件的快速三维图像重建算法[J].计算机应用研究, 2006 (1) :241- 243.
    [20]戴智晟,陈志强,邢宇祥,等.用通用显卡加速三维锥束T-FDK重建算法[J].清华大学学报:自然科学版, 2006, 46 ( 9):1589- 1592.
    [21] Holger Scherl, Benjamin Keck, Markus Kowarschik, and Joachim Hornegger. Fast GPU-Based CT Reconstruction using the Common Unified Device Architecture (CUDA) [C].IEEE Nuclear Science Symposium and Medical Imaging Conference, Honolulu, HI, 2007: 4464-4466.
    [22]庄天戈. CT原理和算法[M].第一版.上海:上海交通大学出版社, 1992:1-99.
    [23] Jiang Hsieh. Computed Tomography: Principle, Design, Artifacts and Recent Advances[M].第一版.北京:科学出版社, 2006:1-71.
    [24]梁亮.基于可编程图形处理器的三维图像快速重建算法研究[D].西安:西北工业大学,2005.
    [25] Radon.J.Uber die Bestimmung von Funktionen durch ihre Integralwertelangs gewisser Mannigfaltigkeiten, Berichte Sachsische Akademieder Wissenschaften [J]. Leipzig. Math.-Phys. K1. , 1917, 69:262-267.
    [26] W.H.Oldendorf. Isolated Flying Spot Detection of Radio Density Discontinuities Displaying the Internal Structural pattern of a Complex object [J]. IRE Trans, 1961, BME-8:68~72.
    [27] A.M.Cormack. Representation of a function by its line integrals with some radiological applications [J]. Journal of Applied Physics, 1963, 34:2722~2727.
    [28] GT.Herman. Image Reconstruction from Projections: The Fundamentals of Computerized Tomography [M]. New York: Academic Press, 1980, 676~678.
    [29]叶海霞.工业CT窄角扇束卷积反投影并行图像重建研究[D].重庆:重庆大学,2003.
    [30]陈庆勇.工业CT重建技术与算法研究[D].重庆:重庆大学,2006:4-7.
    [31]宋钢,邓小玖.浅谈CT现代物理知识[M].1998:20~21.
    [32]王召巴.基于面阵CCD相机的高能X射线工业CT技术研究[D].南京:南京大学. 2001.
    [33]张全红,路宏年,杨民.锥束工业CT中Feldkamp重建算法的快速实现[J].计算机工程与设计, 2006, 27(3):931-933.
    [34]江鹏,陈志强,邢宇翔.锥束CT重建FDK算法的两级并行计算研究[J].核电子学与探测技术, 2006, 26(1):87-90.
    [35]邹晓兵.锥束工业CT扫描方式与近似重建算法的改进[D].重庆:重庆大学,2007.
    [36]翟静.三维锥束CT中滤波反投影算法的研究[D].山西:中北大学,2008.
    [37] Wang G, et al. Error analysis on a generalized feldkamp’s cone-beam computed tomography algorithm [J]. Scanning, 1995, 18:361.
    [38] Zeng Kai, Cheng Zhiqing, Zhang Li, Zhao Ziran. A study in the relationship between the cone angle and the cone–beam reconstruction error [J]. Computerized Tomography Theory and Applications, 2003,3.
    [39]毛希平.图像重建技术在并行处理系统中的应用[J].小型微型计算机系统, 2000, 21(3):289-291.
    [40]陈国良.并行计算——结构、算法、编程[M].高等教育出版社, 1999.
    [41] Miriam Leeser, Srdjan Coric, Eric Miller, Haiqian Yu. Parallel-Beam backprojection: an FPGA Implementation Optimized for Medical Imaging [J]. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 2005, 39(3):295-311.
    [42]刘怡,黄自力,王经纬,唐湘成. FPGA双线性插值图像变换系统的设计与实现[J].中国测试技术. 2008, 34(3).
    [43]王少容,孙晓朋,刘丽艳等.基于图形处理器的通用计算技术[J].信息技术快报, 2005, 1:1-12.
    [44]张雷.代数类三维CT图像重建算法及其硬件加速技术研究[D].北京:清华大学,2007.
    [45] NVIDIA Corporation. CUDA CUFFT Library 2.0[EB/OL]. http://developer.download.nvidia.com/compute/cuda/2_0/docs/CUFFT_Library_2.0.pdf ,2008-04.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700