摘要
单纯采用CPU处理激光雷达点云数据已无法满足其实时性需求。为此,选用NVIDIA Tegra X1作为异构计算平台,对激光雷达数据处理算法进行加速。结合硬件架构特征和激光雷达数据处理算法的特性,通过粗粒度并行解决GPU优化过程中出现的负载不均衡问题。同时采用零复制和数据本地化的方法进行数据的精细优化。实验结果表明,相较于目前智能车上使用的工控机,优化后的激光雷达数据处理算法能够加速5倍~6倍,提高了智能车对雷达数据处理的实时性。
Only using CPU to process lidar point cloud data is unable to meet the real-time demand. NVIDIA Tegra X1 is chosen as a heterogeneous computing platform to accelerate the data processing algorithm for lidar. Combined the hardware architecture features and the characteristics of data processing algorithm for lidar,the load imbalance problem in the GPU optimization process is solved by coarse-grained parallelism. Meanwhile,z ero copy and data localization methods are used to fine-tune the optimization. Experimental results show that compared with the current industrial computer used in the smart car,the performance of the improved lidar data processing algorithm is 5 to 6 times faster,which improves the real-time performance of the radar data processing for the smart car.
引文
[1]CADENA C,CARLONE L,CARRILLO H,et al.Past,present,and future of simultaneous localization and mapping:toward the robust-perception age[J].IEEE Transactions on Robotics,2016,32(6):1309-1332.
[2]汪佩,郭剑辉,李伦波,等.基于单线激光雷达与视觉融合的负障碍检测算法[J].计算机工程,2017,43(7):303-308.
[3]CHEN Tongtong,DAI Bin,LIU Daxue,et al.Velodynebased curb detection up to 50 meters away[C]//Proceedings of Intelligent Vehicles Symposium.Washington D.C.,USA:IEEE Press,2015:241-248.
[4]ZHAO Gangqiang,YUAN Junsong.Curb detection and tracking using 3D-LIDAR scanner[C]//Proceedings of IEEE International Conference on Image Processing.Washington D.C.,USA:IEEE Press,2013:437-440.
[5]HATA A Y,OSORIO F S,WOLF D F.Robust curb detection and vehicle localization in urban environments[C]//Proceedings of Intelligent Vehicles Symposium.Washington D.C.,USA:IEEE Press,2014:1257-1262.
[6]ZHANG Yihuan,WANG Jun,WANG Xiaonian,et al.3D LIDAR-based intersection recognition and road boundary detection method for unmanned ground vehicle[C]//Proceedings of International Conference on Intelligent Transportation Systems.Washington D.C.,USA:IEEE Press,2015:499-504.
[7]MORAVEC H P,ELFES A.High resolution maps from angle sonar[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C.,USA:IEEE Press,1985:116-121.
[8]程健.基于三维激光雷达的实时目标检测[D].杭州:浙江大学,2014.
[9]MURTHY G S,RAVISHANKAR M,BASKARAN M M,et al.Optimal loop unrolling for GPGPU programs[C]//Proceedings of IEEE International Symposium on Parallel and Distributed Processing.Washington D.C.,USA:IEEE Press,2009:1-11.
[10]贾海鹏.面向GPU计算平台的若干并行优化关键技术研究[D].青岛:中国海洋大学,2013.
[11]LI Zhihao,JIA Haipeng,ZHANG Yunquan.HartSift:a high-accuracy and real-time SIFT based on GPU[C]//Proceedings of IEEE International Conference on Parallel and Distributed Systems.Washington D.C.,USA:IEEE Press,2018.
[12]XU Shixiong,GREGG D.Exploiting hyper-loop parallelism in vectorization to improve memory performance on CUDA GPGPU[M].Berlin,Germany:Springer,2015.
[13]魏秋明,梁军,鲍泓,等.异构计算平台图像边缘检测算法优化研究[J].计算机工程,2017,43(5):240-247.
[14]ZHANG Tao,SHU Wei,WU Minyou.CUIRRE:an open-source library for load balancing and characterizing irregular applications on GPUs[J].Journal of Parallel and Distributed Computing,2014,74(10):2951-2966.
[15]梁军,李威,肖琳,等.NVIDIA Tegra K1异构计算平台访存优化研究[J].计算机工程,2016,42(12):44-9.