基于卷积目标检测的3D眼球追踪系统深度估计
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Convolution object detection based depth estimation of 3D eye-tracking system
  • 作者:潘新星 ; 汪辉 ; 陈灵 ; 祝永新 ; 杨傲雷
  • 英文作者:Pan Xinxing;Wang Hui;Chen Ling;Zhu Yongxin;Yang Aolei;School of Mechatronic Engineering and Automation, Shanghai University;Shanghai Advanced Research Institute, Chinese Academy of Sciences;
  • 关键词:眼球追踪 ; 单目视觉 ; 卷积神经网络 ; 针孔相机
  • 英文关键词:eye-tracking;;monocular vision;;convolutional neural network;;pinhole camera
  • 中文刊名:YQXB
  • 英文刊名:Chinese Journal of Scientific Instrument
  • 机构:上海大学机电工程与自动化学院;中国科学院上海高等研究院;
  • 出版日期:2018-10-15
  • 出版单位:仪器仪表学报
  • 年:2018
  • 期:v.39
  • 基金:国家重点研发计划(2017YFA0206104);; 国家自然科学基金(61703262,61873158);; 上海市科学技术委员会科研计划项目(16511108701,16YF1403700,18ZR1415100);; 中国科学院国际合作局对外合作重点项目(184131KYSB20160018);; 张江高科技园区管理委员会项目(2016-14)资助
  • 语种:中文;
  • 页:YQXB201810029
  • 页数:8
  • CN:10
  • ISSN:11-2179/TH
  • 分类号:244-251
摘要
随着虚拟现实(VR)技术的发展,作为其核心技术之一的眼球追踪越来越受到人们的重视。在传统3D视线估计技术的基础上,提出一种通过卷积目标检测恢复目标区域深度信息的3D眼球追踪实现方法。基于头戴式眼球追踪设备Pupil的世界摄像头采集到的图像信息,利用TensorFlow卷积目标检测框架实现对目标的识别和宽度测量,通过建立检测出的宽度值和实际测量的距离值之间的函数关系,来达到实时估计深度信息的目的。实验数据表明该方法在采样图像分辨率为1 080p的6组定点测试中的平均相对误差只有1.17%,而且实时处理速度可达15 f/s,能够对实时的深度信息做出较为准确的预测。在眼球追踪技术越来越成熟、眼动仪等设备成本逐渐降低的背景下,该研究为眼球追踪技术的进一步发展和应用打下了坚实的基础。
        With the development of virtual reality technology, eye-tracking, as one of its core technology, is paid more and more people′s attention. On the basis of conventional 3 D gaze estimation technique, in this paper, a 3 D eye-tracking implementation method is proposed, which recovers the depth information of the object region through convolution object detection. Based on the image information collected by the world camera of the Pupil head wearable eye-tracking device, the TensorFlow convolution object detection framework is used to realize object recognition and its width measurement. Through establishing the function relationship between the detected width value and the distance value of actual measurement, the purpose of estimating the depth information in real time is achieved. The experiment data show that the average relative error is only 1.17% in the six sets of fixed-point tests with the sampled image resolution of 1 080 p and the real-time processing speed reaches 15 frames per second, which can make an accurate prediction of the real-time depth information. Under the background that the eye tracking technology is getting more and more mature and the cost of eye trackers and etc. is decreased gradually, this study lays a solid foundation for the further development and application of eye-tracking technology.
引文
[1] DUCHOWSKI A T. Eye tracking methodology: Theory and practice[M]. Berlin: Springer, 2017.
    [2] CHEN J, TONG Y, GRAY W, et al. A robust 3D eye gaze tracking system using noise reduction[C]. Proceedings of the Symposium on Eye Tracking Research & Applications, 2008: 189-196.
    [3] WANG K, WANG S, JI Q. Deep eye fixation map learning for calibration-free eye gaze tracking[C]. Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, 2016: 47-55.
    [4] CHEN J, JI Q. 3D gaze estimation with a single camera without ir illumination[C]. 19th International Conference on Pattern Recognition, 2008: 1-4.
    [5] FUHL W, TONSEN M, BULLING A, et al. Pupil detection for head-mounted eye tracking in the wild: An evaluation of the state of the art[J]. Machine Vision and Applications, 2016, 27(8): 1275-1288.
    [6] KASSNER M, PATERA W, BULLING A. Pupil: An open source platform for pervasive eye tracking and mobile gaze-based interaction[C]. Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct publication, 2014: 1151-1160.
    [7] 李伟, 张旭东. 基于卷积神经网络的深度图像超分辨率重建方法[J]. 电子测量与仪器学报, 2017, 31(12): 1918-1928.LI W, ZHANG X D. Depth image super-resolution reconstruction based on convolution neural network[J]. Journal of Electronic Measurement and Instrumentation, 2017, 31(12): 1918-1928.
    [8] PFEIFFER T, RENNER P. EyeSee3D: A low-cost approach for analyzing mobile 3D eye tracking data using computer vision and augmented reality technology[C]. Proceedings of the Symposium on Eye Tracking Research and Applications, 2014: 369-376.
    [9] MANSOURYAR M, STEIL J, SUGANO Y, et al. 3D gaze estimation from 2D pupil positions on monocular head-mounted eye trackers[C]. Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, 2016: 197-200.
    [10] 高飞, 葛一粟, 汪韬, 等. 基于空间平面约束的视觉定位模型研究[J]. 仪器仪表学报, 2018, 39(7): 183-190.GAO F, GE Y S, WANG T, et al. Vision-based localization model based on spatial plane constraints[J]. Chinese Journal of Scientific Instrument, 2018, 39(7): 183-190.
    [11] 韩延祥, 张志胜, 戴敏. 用于目标测距的单目视觉测量方法[J]. 光学精密工程, 2011, 19(5): 1110-1117.HAN Y X, ZHANG ZH SH, DAI M. Monocular vision system for distance measurement based on feature points[J]. Optics and Precision Engineering, 2011, 19(5): 1110-1117.
    [12] 张帆, 董秀成, 王艺. 一种基于单目视觉的实时人机测距系统研究[J]. 计算机应用研究, 2013, 30(12): 3866-3869.ZHANG F, DONG X CH, WANG Y. Real-time face to camera distance measurement system based on monocular vision[J]. Application Research of Computers, 2013, 30(12): 3866-3869.
    [13] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]. Computer Vision and Pattern Recognition, arXiv: 1409.4842,2015.
    [14] SHRIVASTAVA A, SUKTHANKAR R, MALIK J, et al. Beyond skip connections: Top-down modulation for object detection[J]. Computer Vision and Pattern Recognition, arXiv: 1612.06851, 2016.
    [15] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
    [16] 徐新飞, 刘惠义. 基于卷积神经网络的人脸表情识别[J]. 国外电子测量技术, 2018, 37(1): 106-110.XU X F, LIU H Y. Facial expression recognition based on convolutional neural network[J]. Foreign Electronic Measurement Technology, 2018, 37(1): 106-110.
    [17] 周晓彦, 王珂, 李凌燕. 基于深度学习的目标检测算法综述[J]. 电子测量技术, 2017, 40(11): 89-93.ZHOU X Y, WANG K, LI L Y. Review of object detection based on deep learning[J]. Electronic Measurement Technology, 2017, 40(11): 89-93.
    [18] ABADI M, AGARWAL A, BARHAM P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems[J]. Computer Vision, arXiv: 1603.04467, 2016.
    [19] HUANG J, RATHOD V, SUN C, et al. Speed/accuracy trade-offs for modern convolutional object detectors[C].IEEE Conference on Computer Vision and Pattern Recognition, 2017,doi:10.1109/CVPR.2017.351.
    [20] LIU W, ANGUELOV D, ERHAN D, et al. Ssd: Single shot multibox detector[C].European Conference on Computer Vision, 2016: 21-37.
    [21] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J].Computer Vision and Pattern Recognition, arXiv:1704.04861, 2017.
    [22] 傅思勇, 吴禄慎, 陈华伟, 等. 综合多畸变因素的摄像机标定[J]. 仪器仪表学报, 2018, 39(2): 248-256.FU S Y, WU L SH, CH H W, et al. Camera calibration based on multiple distortion factors[J]. Chinese Journal of Scientific Instrument, 2018, 39(2): 248-256.
    [23] REMPEL D, WILLMS K, ANSHEL J, et al. The effects of visual display distance on eye accommodation, head posture, and vision and neck symptoms[J]. Human Factors, 2007, 49(5): 830-838.
    [24] CHAI T, DRAXLER R R. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature[J]. Geoscientific Model Development, 2014, 7(3): 1247-1250.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700