基于GPU的数据流处理方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
GPU作为一种新型流处理,具备了流处理模型的特点,价格低廉,普及性高,并且拥有强大的并行计算能力和高内存带宽。这种高性能运算能力,已经越来越多地受到各个研究领域学者的重视。数据流作为一种新的数据形态,具有数据快速,连续到达,潜在巨大容量等特点。如何提高数据流处理系统的吞吐能力,提高数据流处理和挖掘算法的实时性成为数据流研究领域的一个重要研究问题。
     本文重点着眼于图形处理器通用计算在数据流挖掘领域的应用研究,特别是非规则流中高维数据流的高性能处理是本文的一大特点,在理论上提出了一个图形处理器数据流并行计算的通用框架模型,分别从规则流数据和高维数据流两个角度出发,分析数据流处理算法的耗时部分,研究如何将其串行算法移植到GPU上进行运算,提高其性能。
     针对规则流数据,本文根据三维图像重构的数学模型理论和应用矩阵论进行了电镜三维图像重构的研究,提出了其基于GPU的并行算法,并在GPU的CUDA平台上对规则的投影流数据进行了仿真实验,实验证明了该算法在计算资源受限情况下处理速度可以提高50倍左右,同时保证了图像质量。
     针对高维数据流,本文提出一种基于GPU的非规则流中高维数据流的处理模型和具体的可行架构,并在该框架下基于统一计算设备架构(CUDA)使用数据立方模型以及降维约简技术并行分析了多条高维数据流的典型相关性。经理论分析和实验证明,该并行处理方法能够在线精确地识别同步滑动窗口模式下高维数据流之间的相关性,相对于纯CPU方法,该方法具有显著的速度优势,很好地满足了高维数据流的实时性需求,可以作为通用的分析方法广泛应用于高维数据流挖掘领域。
As a kind of novel stream microprocessor, GPU has the characters of stream processing model. Because its inexpensive, widespread, powerful computing horsepower and high bandwidth general purpose parallel computing device, the HPC capability of GPU draws more and more domestic and abroad scholars'attention. As a new type of data, data stream has the properties of fast, continuous arrival, potential immensity capacity. How to increase data stream processing system throughput and real time processing ability of data stream becomes one of key problems in data stream research area.
     The paper focuses on the application study of General-purpose computing on Graphic Processing Unit on the data streams, especially, in the irrergular streams the HPC processing of high dimensional data streams is the major feature of the paper; presents a GPU data stream parallel computing frame model. Doing the esearch on both regular streams and multiple-dimension stream processing method separately is to analyse the part of time-consuming and try to transfer the serial CPU algorithm to the GPU.
     According to the regular data streams, based on maths model of 3D image reconstruction and matrix theory, ET 3D image reconstruction algorithm based on GPU is processed, which is simulated on the CUD A platform of GPU to the regular project stream data. The experimental result is that the processing speed is increased by 50 times under the circumstance of resource-constraints, at the same time, the quality of the image is guaranteed.
     According to the high dimensional data streams, GPU-based processing model and specific and practical architecture for high dimensional data streams in the irregular streams are proposed, meanwhile, based on Compute Unified Device Architecture (CUDA), canonical correlation analysis between two multiple dimensions data streams using data cube pattern and dimensionality-reduction technique is carried out in this framework. The theoretical analysis and experimental results show that the parallel processing method can online detect correlations between multiple dimension data streams accurately in the synchronous sliding window mode. According to the pure CPU method, this method has significant speed advantage, well meet the real-time requirement of high-dimensional data streaming and can be applied to the field of high-dimensional data stream mining widely.
引文
[1]吴恩华.图形处理器用于通用计算的技术现状及其挑战[J].软件学报,2004,15(10):1493-1504.
    [2]NIVDIA CUDA Programming Guide 2.2[EB/OL]2009-06-01[2009-05-26]. http://www. nvidia. com/object/cuda_get. html.
    [3]曹锋,周傲英.基于图形处理器的数据流快速聚类[J].软件学报,2007,18(2):291-302
    [4]Buck I, Foley T, Horn D, et al.Brook for GPU:stream computing on graphics hardware[J]. ACM Trans. On Graphics,2004,23(3):777-786.
    [5]刘伟峰,王智广,细粒度并行计算编程模型研究[J].微电子学与计算机,2008,25(10):103-106.
    [6]金澈清,钱卫宁,周傲英.数据流分析与管理综述[J].软件学报,2004,15(08):1172-1181.
    [7]Sullivan M., Heybey A. Tribeca:A Sysytem for Managing Large Databases of Network Traffic[C]. In Proc. USENIX Annual Technical Conf.1998.
    [8]Chen Y., Dong G. Han J. Multi-Dimensional Regression Analysis of Time-Series Data Streams[C]. In Proc. Int. Conf. on Very Large Data Base,2002:323-324.
    [9]Wu Nan. Key Techniques Research of High Productivity Stream Architecture[D]. Graduate School of Nation University of Defense Technology,2009,8, ChangSha, Hunan, P.R.China.
    [10]倪志伟,黄玲,李锋刚等.数据流管理与挖掘研究[J].合肥工业大学学报:自然科学版,2005,28(9):1157-1162.
    [11]S. Muthukrishnan. Data streams:algorithms and applications[M]. Now Publishers, 2005.
    [12]彭宏,刘洋,邓维维,等.股票数据流的相关性计算方法[J].华南理工大学学报(自然科学版),2006,34(1):86-89.
    [13]Poosala V, Ioannidis Y, Haas P, et al. Improved histograms for selectivity estimation of range predicates[J]. SIGMOD,1996,25(2):294-305.
    [14]Jawerth B, Sweldens W. An overview of wavelet based multiresolution analyses[C]. SIAM Review,1994,36(3):377-412.
    [15]Abadi D. J, Carney D, Cetintemel U, et al. Aurora:a new model and architecture for data stream management[J]. VLDB Journal,2003(13):120-139.
    [16]Vitter J, S. Rondam Sampling with a Reservoir[J]ACM TMS,1985:37-57.
    [17]孙玉芬,卢炎生.数据流挖掘综述[J].计算机科学.2007.1(34):1-5.
    [18]Garofalakis M, Gibbons PB. Wavelet synopses with error guarantees[C]. In:Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int'1 Conf. on Management of Data. Madison:ACM Press,2002:476-487.
    [19]Shivnath Babu, Utkarsh Srivastava, and Jennifer Widom. Exploiting k-constraints to reduce memory overhead in continuous queries over data streams[J]. ACM Trans. Database Syst.,29(3),2004.
    [20]S. Muthukrishnan. Data streams:Algorithms and Appl i cations[J]. Foundations and Trends in Theoretical Computer Science,1(2),2005.
    [21]杨培,黄厚宽.数据流上的分位数近似算法研究[J].计算机研究与发展,2008,45(2):287-292.
    [22]Gibbons PB, Tirthapura S. Distributed streams algorithms for sliding windows[C]. In:SPAA 2002:Proc. of the 14th Annual ACM Symp. on Parallel Algorithms and Architectures. Winnipeg:ACM Press,2002:63.-72.
    [23]Zhu Y, Shasha D. StatStream:Statistical monitoring of thousands of data streams in real time[C]. In:Bernstein P, Ioannidis Y, Ramakrishnan R, eds. Proc. of the 28th Int'1 Conf. on Very Large Data Bases. Hong Kong:Morgan Kaufmann,2002: 358-369.
    [24]DeHaan D, Demaine ED, Golab L, Lopez-Ortiz L, Munro JI. Towards identifying frequent items in sliding windows[R]. Technical Report, CS-2003-06, Waterloo: University of Waterloo,2003.
    [25]屠莉,陈峻,邹凌君.基于相关分析的多数据流聚类[J].软件学报,2009,20(7):1756-1767.
    [26]Moses Charikar, Kevin Chen, Martin Farach-Colton. Finding frequent items in data Streams[J]. Theor. Comput. Sci.,312(1),2004.
    [27]Manku G S, Motwani R. Approximate frequency counts over data streams[C]. In:Proc of the 28th VLDB Conf 2002:346-357.
    [28]刘耀宗,王湛,张宏,等.数据流预测与分类研究[J].计算机科学,2007,34(11):170-173.
    [29]Aggarwal C. C, Han Jiawei, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams[C]. VLDB Conference,Berlin, Germany,2003:81-92.
    [30]Aggarwal C. C, Han Jiawei, Wang Jianyong, et al. A framework for projected Clustering of high dimensional data streams[C]. VLDB, Toronto, Canada, 2004:852-863.
    [31]Zhu Y, Shasha D. StatStream:Statistical Monitoring of Thousands of Data Streams in Real Time[C]. In:Proc of the 28th VLDB Conf,2002:358-369.
    [32]Guha S, Gunopulos D, Koudas N. Correlating Synchronous and Asynhronous Data Streams[C]. In:Proe of The 9th ACM SLGKDD Intl Conf on Knowledge Discovery and Data Mining,2003:529-534.
    [33]王大能,陈勇,隋森芳.电子显微学在结构生物学研究中的新进展[J].电子显微学报,2003,22(5):449-456.
    [34]高欣.新型迭代图像重建算法的理论研究与实现[D].杭州:浙江大学,2004,05.
    [35]李晶,阮兴云,徐枝荣.电子断层三维重构方法[J].医疗卫生装备,2006,27(10):54-55.
    [36]文梅,,伍楠,张春元,李海燕,李礼.流体系结构抽象模型研究.计算机科学与工程[J],2006,28(7):123-126
    [37]白洪涛,欧阳丹彤,何丽莉.一种基于图形处理器的频繁模式挖掘算法[J].仪器仪表学报,2009,10,30(10):2082-2087.
    [38]郭莉,谭建龙,王映.面向信息安全的实时数据流计算[EB/OL]. http://www.ict.ac.cn/5-3.asp?id=42 & letternum=10.
    [39]杜威,邹先霞.基于数据流的滑动窗口机制的研究[J].计算机工程与设计,2005,26(11):2922-2924.
    [40]杨雪梅,董逸生,徐宏炳,等.高维数据流的在线相关性分析.计算机研究与发展[J].2006.43(10):1744-1750.
    [41]杨明,刘先忠.矩阵论(研究生教学用书)[M].华中科技大学出版社,2005,3.
    [42]Harris, G., Haines, K., and Smith, L. GPU accelerated radio astronomy signal convolution [J]. Radio Astronomy 22,1-2(2008):129-141.
    [43]Mehmet S. Detecting Time Correlations in Time-Series Data Streams[R], Tech Report, Intelligent Enterprise Technologies Laboratory, www.hpl.hp.com/techreports/2004/HFL-2004-103.html.
    [44]Johnson W. B., Lindenstrauss J. Extension of Lipshitz mapping into Hilbert Space. Contemporary Mathematics[J].1984,5(26):189-206.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700