用户名: 密码: 验证码:
移动机器人听觉导航系统中语音分离技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着数字信号处理技术和声电技术的不断发展和完善,以及拟人机器人人工智能水平的进步,听觉系统作为人类感官的重要组成部分,已经成为机器人研究领域的重要研究对象。由于声音具有绕过障碍物的特性,在机器人多信息采集系统中,听觉可以与机器人视觉相配合,弥补其视角范围有限性及不能穿过非透光障碍物的局限性。在复杂任务模型中,高质量的语音信号是智能机器人依靠听觉进行人机交互的前提条件。因此,以语音信号分离技术为基础的听觉感知系统,能更广泛的应用于实际噪声环境,为智能机器人提供清晰独立的声源信号,使它具有人耳般敏锐的听觉。
     本文针对传统盲源分离算法的信号卷积混合模型进行扩展,通过采用基于双耳模型的欠定盲源分离算法,改善了由于分离矩阵迭代而产生的系统实时性问题,进一步增强了盲源分离算法对实际噪声环境的适应性。此外,为提高机器人听觉系统在三维空间中的搜索性能,本文设计了一种基于欠定盲源分离和互相关时延估计算法的移动机器人听觉导航系统。整个系统分为信号采集及预处理、语音分离、声源定位和机器人控制四个部分。通过建立正四面体结构的麦克风阵列,感知三维空间中目标发出的声音信号;采用基于时频掩膜的欠定盲源分离算法,根据参数空间柱状图构建时频掩膜,实时估计声源信号;通过建立定位计算的数学模型测定目标的方位信息;在寻找目标和避障程序的双线控制下,机器人能避开障碍物顺利到达目标位置。实验证明,该听觉定位导航系统可实现机器人在三维空间中对多声源目标的实时定位,定位精度较高。
With the continuous development and improvement of digital signal processing technology and acoustic electric technology, as well as the level of humanoid robot improving in artificial intelligence, auditory system, as an important component of the human senses, has become an important study in robot research. Since the voice has the characteristics to bypass the barrier, in robot multi-information acquisition system, hearing can be matched with the robot vision to make up its limited field of vision and the limitations that can not pass through the non-translucent barrier. In complicated task models, high-quality voice is a prerequisite that the human-computer interaction can rely on hearing system of intelligent robot. Therefore, the acoustical perception system based on speech separation technology can widely apply to actual noise circumstance and supply clear and independent source signals to intelligent robot for its hearing system as keen as humanoid audition.
     The traditional signal convolution model of blind source separation algorithm is improved through the use of binaural model based underdetermined blind source separation algorithm in this paper to solve the real-time problem produced by iteration for separation matrix, further enhancing the adaptability of blind source separation algorithm in actual noisy environment. In order to increase the searching performance of robot hearing system in three-dimensional space a sound source localization and navigation system of mobile robot hearing based on time delay estimation and underdetermined speech separation is presented in this paper. The whole system should be divided into four parts including the signal acquisition and preprocessing, speech separation, position calculation and robot control. The tetrahedral structure of microphone array can percept signals from three-dimensional space. The system can estimate source signals using a real-time underdetermined blind separation method based on time-frequency masking which constructed by the histogram of parameter space. The measurement of target position depends on mathematical model of position calculation. Under the dual control of programs, find the target and avoid obstacles, robot can avoid obstacles and reach the target position successfully. Experiments showed that the design can make the robot find the position of many sources and reach the target rapidly and accurately.
引文
[1] Okuno H, Gogata T, Komatani K. Computational auditory scene analysis and its application to robot audition[J]. Informatics Research for Development of Knowledge Society Infrastructure, 2007, 29:69-76.
    [2] Giannakis G B, Swami A. New results on stte-space and input-out-put identification of non-Gaussian processing using cumulants[C]. Proc. SPIE’87. San Diego, CA, 1987, 826:199-205.
    [3] Linsker R. An application of the principle of maximum information preservation to linear systems[C]. Adv. Neural Inform. Processing Systems, 1989, pp.186-194.
    [4] Linsker R. Self-organization in a perceptual network[J]. Computer, 1988, 21:105-117.
    [5] Jutten C, Herault J. Blind separation of sources, PartⅠ: An adaptive algorithm based on neuromimatic architecture[J]. Signal Processing, 1991, 24(1):1-10.
    [6] Comon P. Blind separation of sources, PartⅡ: Problem statement[J]. Signal Processing, 1991, 24(1):11-24.
    [7] Sorouchyari E. Blind separation of sources, PartⅢ: Stability analysis[J]. Signal Processing, 1991, 24(1): 25-29.
    [8] Burel G. Blind separation of sources: A nonlinear neural algorithm[J].Neural Network, 1992, 5:937-947.
    [9] Comon P. Independent component analysis, a new concept[J]. Signal Processing, 1994, (36):287-314.
    [10]Cichocki A, Unbehauen R, Moszczynski L, et al. A new on-line adaptive learning algorithm for blind separation of source signal[C]. ISANN94. Taiwan, 1994:406-411.
    [11]Bell A J, Sejnowski T J. An information-maximization approach to blind separation and blind deconvolution[J]. Neural Computation, 1995, 7(6):1004-1034.
    [12]Pearlmutter B A, Parra L C. A context-sensitive-generalization of ICA[C]. Proc. Int. Conf. Neural Information Processing, ICONIP’96. Hong Kong, 1996:151-157.
    [13] Cardoso J F. Informax and maximum likelihood for source separation[J]. IEEE. Signal Processing Letters, 1997, 4:109-111.
    [14] Smaragdis P. Blind separation of convolved mixtures in the frequency domain[J]. Neurocomputing, 1998, 22 (1):21-34.
    [15] A. Hyv?rinen and E. Oja. A fast fixed-point algorithm for Independent Component Analysis[J]. Neural Computation, 1997, 9(7):1483-1492.
    [16] A. Hyv?rinen. Fast and robust fixed-point algorithms for Independent Component Analysis[J]. IEEE Transactions on Neural Networks, 1999, 10(3):626-634.
    [17] R. Mukai, H. Sawada, S. Araki, S. Makino. Real-Time Blind Source Separation for Moving Speakers using Blockwise ICA and Residual Crosstalk Subtraction[C]. in Proc. of ICA2003, 2003, pp.975-980.
    [18] R. Mukai, H. Sawada, S. Araki, S. Makino. Robust Real-Time Blind Source Separation for Moving Speakers in a Room[C]. in Proc. of ICASSP2003, 2003, 5:469-472.
    [19] R. Mukai, H. Sawada, S. Araki, S. Makino. Frequency Domain Blind Source Separation for Many Speech Signals[C]. in Proc. of ICA 2004, 2004, pp.461-469.
    [20] R. Mukai, H. Sawada, S. Araki, S. Makino. Blind Source Separation and DOA Estimation Using Small 3-D Microphone Array[C]. in Proc. of HSCMA 2005, 2005, pp. 9-10.
    [21] R. Mukai, H. Sawada, S. Araki, S. Makino. Real-Time Blind Source Separation and DOA Estimation Using Small 3-D Microphone Array[C]. in Proc. of IWAENC 2005, 2005, pp.45-48.
    [22] R. Mukai, H. Sawada, S. Araki, S. Makino. Blind Source Separation of 3-D Located Many Speech Signals[C]. in Proc. of WASPAA 2005, 2005, pp.9-12.
    [23] R. Mukai, H. Sawada, S. Araki, S. Makino. Blind Source Separation of Many Signals in the Frequency Domain[C]. in Proc. of ICASSP 2006, 2006, 5:969-972.
    [24] R. Mukai, H. Sawada, S. Araki, S. Makino. Frequency Domain Blind Source Separation in a Noisy Environment[C]. 4th Joint Meeting of ASA and ASJ, 2006, 120(5):3045-3048.
    [25] R. Irie. Robust sound localization: An application of an auditory perception system for a humanoid robot[D]. MIT Department of Electrical Engineering and Computer Science, 1995.
    [26] R. Brooks, C. Breazeal, M. Marjanovie, B. Scassellati, and M. Williamson. The Cog project: Building a humanoid robot[J]. In Computation for Metaphors, Analogy, and Agents, C. Nehaniv, Ed. Spriver-Verlag, 1999, pp.52-87.
    [27] Jie Huang, Tadawute Supaongprapa, Ikutaka Terakura, et al. A model-based sound localization system and its application to robot navigation[J]. Robotics and Autonomous Systems, 1999, 27(4):199-209.
    [28]王坚,蒋涛,曾凡钢.听觉科学概论[M].北京:中国科学技术出版社, 2005.
    [29] K. Nakadai, H. G. Okuno and H. Kitano. Real-time sound source localization and separation for robot audition[C]. In Proceedings IEEE International Conference on Spoken Language Processing, 2002, pp.193-196.
    [30] K. Nakadai, T. Lourens,H. G. Okuno,and H. Kitano. Active audition for humanoid[C]. In Proceedings National Conference on Artificial Intelligence, 2000, pp.832-839.
    [31] K. Nakadai, K. Hidai, H. G. Okuno and H. Kitano. Real-time multiple speaker tracking by multi-modal integration for mobile robots[C]. In Proceedings Eurospeech, 2001, pp.1193-1196.
    [32] Yamamoto S, Nakadai K, Tsujino H, Yokoyama T, Okuno H.G. Improvement of robot audition by interfacing sound source separation and automatic speech recognition with Missing Feature Theory[C]. In Proceedings ICRA 2004, Robotics and Automation, 2004, 2:1517-1523.
    [33] KazuhiroNakadai, Hirofumi Nakajima, MasamitsuMurase. Realtime tracking of multiple sound sources by integration of in-room and robot-embedded microphone arrays[C]. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, pp.852-859.
    [34]胡光锐,虞晓.基于二阶前向结构和信息最大理论的语音增强算法[J].上海交通大学学报, 2000, 34(7):877-880.
    [35]于振江.拟人机器人听觉跟踪系统的研究[D].河北工业大学,机械制造及自动化学院, 2007.
    [36] Cardoso J F, Souloumiac A. Jacobi angles for simultaneous diagonalization[J]. SIAM Journal of Matrix Analysis and Application, 1996, 17(1):161-164.
    [37] Yeredor A. Non-orthogonal joint diagnalization in the least-squares sense with application in blind source separation[J]. IEEE Transactions on Signal Processing, 2002, 50(7):1545-1553.
    [38] R. Balan, A. Jourjine, J. Rosca, A particular case of the singular multivariate AR identification and BSS problems[C]. In 1st International Conference on Independent Component Analysis, 1999, pp.12-21.
    [39] P. Comon. Blind channel identification and extraction of more sources than sensors[C]. In SPIEConference, 1998, pp.2-13.
    [40] M. V. Hulle. Clustering approach to square and non-square blind source separation[C]. In IEEE Workshop on Neural Networks for Signal Processing (NNSP), 1999, pp.315-323.
    [41] P. Bofill, M. Zibulevsky. Blind separation of more sources than mixtures using sparsity of their short-time Fourier transform[C]. In International Workshop on Independent Component Analysis and Blind Signal Separation, 2000, pp.87-92.
    [42] O. Yilmaz, S. Rickard. Blind separation of speech mixtures via time-frequency masking[J]. IEEE Transactions on Signal Processing, 2004, 52(7):1830-1847.
    [43] Hui Liu. Acoustic Positioning Using Multiple Microphone Arrays[D]. Master Thesis at Dalhousie University, 2003.
    [44]张卫平,王伟策.任意形状三阵元平面声被动目标定位分析[J].探测与控制学报, 2003, 9, 25(3): 54-57.
    [45]贾云得,冷树林,刘万春等.四元被动声敏感阵列定位模型分析和仿真[J].兵工学报, 2001. 5, 22(2): 206-209.
    [46]顾晓辉,王晓鸣.用双直角三角形阵对声目标定位的研究[J].声学技术, 2003, 22(1):44-47.
    [47]祝龙石,庄志洪,庄志泰.用正四棱锥形阵对声目标定位研究[J].声学学报, 1999, 24(2):204-209.
    [48] Mumolo E, Nolich M, Vercelli G. Algorithms for acoustic localization based on microphone array in service robotics[J]. Robotics and Autonomous Systems, 2003, 42(2):69-88.
    [49] Peng Yang, Qinqi Xu, Linan Zu, Auditory System Design Based on Mobile Robot[C]. 2009 Second International Conference on Intelligent Networks and Intelligent Systems, 2009, pp.265-268.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700