基于多模式卫生信息网络门户的3G手机语音控制研究

英文题名：Research on 3G Mobile Voice Control of a Multimodal Health Information Web Portal
作者：罗裕坤
论文级别：硕士
学科专业名称：生物医学工程
中文关键词：语音识别 ; 3G手机 ; 卫生信息系统 ; 网络门户网站 ; 多模态系统 ; 信息互动管理IVR ; DSR ; 协议
英文关键词：speech recognition ; health information system ; web portals ; 3G ; MMI ; multimodal system ; interaction manager ; IVR ; DSR ; protocols ; SCXML ; VoiceXML
学位年度：2010
导师：叶志前
学科代码：0831
学位授予单位：浙江大学
论文提交日期：2010-06-01

摘要

现在可用包括移动通信等多种方式访问卫生信息系统,但很难实现语音识别模式。本论文主要研究和提供如何将语音识别与第三代(3G)手机界面结合、查看并访问卫生信息的方法。与以往的研究不同,本文构建了一个利用多组件分发同步工作的分布式多式系统。为了实现这一系统的同步性和互操作性,本文将国际电信、网络和多通道体系结构标准归并。参考文献中将提供有关这些标准和协议的资料。
     多模式系统由两个互动的形态组件构成：选用手机web浏览器作为3G手机界面和图形形态组件；语音形态组件则是在远程服务器上而不是在手机上使用一个IVR语音框架执行语音识别。基于网络的卫生模拟转换门户网站作为医疗信息的数据接口,通过图形形态组件同步查看上述信息,与通过语音形态组件输入的语音构成互动机制。上述构成考虑到了其它实现web门户网站数据转换的技术问题,并且图形模式组件可实时更新。
     本文将最终集成实现上述原理的原型系统,可基于互操作分布式的国际标准同步显示语音和图形。将来,可利用该系统进一步研究语音识别及个人医疗web门户系统,并在卫生信息门户系统集成标准化的基础上为评价研究医生工作流提供建议。
     本文的创新点：集成分布式国际标准,提出了一个基于该标准的多模式交互系统,系统各接口与通信组件实现了与其它标准化组件的互操作性。只要遵守该标准化接口，可在不知道各组件内部细节的情况下独立开发其它组件——这样就可以用较少的数据资源进行各种组件的开发。这有利于未来多通道交互系统项目的规范化,因此,本文不仅仅研究了基于图形和语音的卫生信息系统更有利于其它传感器技术的合作和共同操作,有助于卫生信息系统的无障碍交互。
Health information systems can now be accessed in a variety of ways, including using mobile devices. However, its use with speech recognition has thus far been limited.This thesis's main objective is to research and provide a method on how to use speech recognition and a 3G mobile phone interface to view and access this health information. To approach this problem differently, a distributed multimodal system where many components are distributed apart, but work together in synchronisation is proposed. To achieve this distribution, synchronisation and interoperability, the thesis concentrated on adhering to and implementing international telecommunication, web and multimodal architecture standards, whose relevant information is later provided.
     The multimodal system consisted of two modality components i.e. modes of interaction. The mobile web browser was chosen to act as the 3G mobile interface and form the graphical modality. The voice modality consisted of a speech framework to perform the speech recognition on a remote server instead of on the phone. A simulated transformed web-based health portal is chosen as the interface to health information data. In order to synchronise viewing these web portals via the graphical modality with speech from the voice modality an interaction mechanism was implemented. The framework had to consider other technologies to implement transformation of the web portal data and to update the graphical modality.
     The final integrated and implemented prototype system is then presented. The results are shown where voice and a graphical modality are synchronised after a web initiated session. It is shown that distributed components need to conform to standards to interoperate with the system. This multimodal system based on standards can now be used for continuing speech recognition and health web portal research. Suggestions are provided so that the system can be fully integrated with a standardised health information portal and then how to evaluate physician workflow for future research.
     By providing a distributed standardised multimodal interaction system, components with standardised interfaces and communication allowed for mutual interoperability. Components could also be developed independently and separately without knowing the inner details of each other component as long as standardised interfaces were conformed to. A distribution of components allowed for more powerful components to do the data processing of components with fewer resources. For future projects, with a standardised distributed multimodal interaction system, not just graphic and speech, but other sensor modalities from other research teams may co-operate together to allow seamless interaction with Health Information Systems.

引文

[1]Janine Sturm, "On the usability of Multimodal interaction for mobile access to information services," in Doctor's thesis.,2005.
    [2]B. Suhm, B. Myers, and A. Waibel, "Multimodal error correction for speech interfaces," ACM Transactions on Computer-Human Interaction, vol.8, no.l, pp.60-98, March 2001.
    [3]Scott Durling and Jo Lumsden, "Speech recognition use in healthcare applications," in International Conference on Mobile Computing and Multimedia, Linz, Austria,2008, pp.473-478.
    [4]Ericka Chickowski. (2009, August) Speech Recognition May Speed EMR Adoption. [Online]. http://www.smartertechnology.com/c/a/Technology-For-Change/Speech-Recog nition-May-Speed-EMR-Adoption/1/
    [5]American Health Information Management Association. (2003) Speech Recognition in the Electronic Health Record. [Online]. http://library.ahima.org/xpedio/groups/public/documents/chima/bok1_022192.h csp?dDocName=bok1_022192
    [6]W3C. (2007,19 June) Voice Extensible Markup Language (VoiceXML) 2.1. [Online]. http://www.w3.org/TR/voicexm121/
    [7]Dan Bohus, Antoine Raux, and Thomas K. Harris, "Olympus:an open-source framework for conversational spoken language interface research".
    [8]Antonio Coronato, "Middleware Services for Multimodal Interactions in Smart Environments," Sixth Annual IEEE International Conference on Pervasive Computing and Communications, pp.515-519,2008.
    [9]Miguel Angel Salichs, Javi F. Gorostiza, and Alaa M. Khamis, "Multimodal Human-Robot Interaction Framework for a Personal Robot".
    [10]Dararat Kaitrungrit, Matthew N. Dailey, and Chai Wutiwiwatchai, "Thai Voice Application Gateway," in Proceedings of ECTI-CON,2008, pp.173-176.
    [11]A Taxonomy of Multimodal Interaction in the Human Information Processing System. [Online]. http://hwr.nici.kun.nl/～miami/taxonomy/taxonomy.html
    [12]D. Tzovaras S. Petsas, "WAP-based Personalised Health Care Services," in Proceedings of the 23rd Annual EMBS International Conference, Instanbul, Turkey,2001.
    [13]Liferecord.com. [Online]. http://www.liferecord.com/emr/news/files/d7e78a381dfdd757bf66a80074a58cd 3-1.html
    [14][Online]. http://stanford.wellsphere.com/general-medicine-article/i-moan-u-moan-we-all-moan-for-iphone/511897
    [15]A. Mohan, "Voice Enabled Request and Response for Mobile Devices Supporting WAP Protocol," EEE, VTC,2000.
    [16]Mohan et al., "A Strategy for Voice Browsing in 3G Wireless Networks," in EUROCON 2001, International Conference on Trends in Communications, Technical Program,2001.
    [17]Horst Rossler, Jurgen Sienel, and Wieslawa Wajda, "Multimodal Interaction for Mobile Environment," Alcatel SELAG Research and Innovation,.
    [18]Opera Software Multimodal Browser. [Online]. http://www-306.ibm.com/softw are/pervasive/
    [19]Larson Tech. [Online]. http://www.larson-tech.com/MM-Projects/Demos.htm
    [20]Z.C.Carrion, On the Development of Adaptive and Portable Spoken Dialogue Systems:Emotion Recognition, Language Adaptation and Field Evaluation.: University of Granada,2008.
    [21]S. Seneff et al., "Galaxy-Ⅱ:A reference architecture for conversational system development," in Proc. ICSLP,1998,1998.
    [22]Matus Pleva, Jan Papaj, L'ubomir Dobos, Jozef Juhar Anton Cizmar, "MOBILTEL-Mobile Multimodal Telecommunications dialogue system based on VoIP telephony," Journal of Electrical and Electronics Engineering, vol.2, no.2, pp.134-137,2009.
    [23]Alexander Gruenstein, Stephanie Seneff, and and Chao Wang, "Scalable and Portable Web-Based Multimodal Dialogue Interaction with Geographical Databases," in Interspeech 2006 ICSLP,2006, pp.453-456.
    [24]James C. Ferrans and Jonathan Engelsma, "Software Architectures for Networked Mobile Speech Applications," in Automatic Speech Recognition on Mobile Devices and over Communication Networks.:Springer London,2008, ch.13, pp.279-299.
    [25]Giuseppe Di Fabbrizio, Thomas Okken, and Jay G. Wilpon, "A Speech Mashup Framework for Multimodal Mobile Services," in Proceedings of the 2009 international conference on Multimodal interfaces, Cambridge, Massachusetts, USA,2009, pp.71-78.
    [26]IBM & Motorola. (2005, July) IETF 66 OMA Architecture Document "Multimodal and Multi-device Architecture". [Online]. http://member.openmobilealliance.org/ftp/public_documents/BAC/MAE/Perma nent_documents/OMA-AD-MMMD-V1_0-20060612-D.zip
    [27]Open Mobile Alliance. (2006) OMA multimodal and multi-device enabler architecture. [Online]. http://www.openmobilealliance.org/Technical/release_program/mmmd_v1_0.as px OMA-AD-MMMD-V1_0-20081024-A
    [28]Ingmar Kliche, "The W3C Multimodal Architecture," Deutsche Telekom Laboratories, ETSI Workshop:Multimodal Interaction on Mobile Devices, Nov
    19,2008.
    [29]W3C. (2009, Dececmber) Multimodal Architecture and Interfaces. [Online]. http://www.w3.or/TR/mmi-arch
    [30]David Harel and M Politi, Modeling Reactive Systems with Statecharts:The STATEMATE Approach.:McGraw-Hill,1998. [Online]. http://www.wisdom.weizmann.ac. il/～dharel/reactive_systems.html
    [31]OMG. (2009) UML Specification Version 2.0. [Online]. http://www.omg.org/uml/
    [32]"Voice Browser" Activity. [Online]. http://www.w3.org/Voice/
    [33]Jim Barnett and et al, "State Chart XML (SCXML):State Machine Notation for Control Abstraction," October 2009. [Online]. http://www.w3.org/TR/scxml/
    [34]Commons SCXML. [Online]. http://commons.apache.org/scxml/
    [35]W3C. (2010, March) Voice Extensible Markup Language (VoiceXML) 3.0. [Online]. http://www.w3.org/TR/voicexml30/
    [36]W3C. CCXML 1.0 specification. [Online]. http://www.w3.org/TR/ccxml
    [37]Phonologies. (2010) OktopousTM, ccXML Open Source PIK v1.1. [Online]. http://phonologies.com/okto_os.php#fred
    [38]W3C. (2009, February) EMMA:Extensible MultiModal Annotation markup language. [Online]. http://www.w3.org/TR/emma/
    [39]World Wide Web Consortium. (29 July 2008, July) XHTML Basic 1.1, W3C Recommendation. [Online]. http://www.w3.org/TR/xhtml-basic/
    [40]Andrew Trice. (2010, may) Flash Player 10.1 for Android at Google I/O. [Online]. http://www.insideria.com/2010/04/flash-player-101-for-android-a.html
    [41]IETF. (2006, April) RFC4463. [Online], http://www.ietf.org/rfc/rfc4463.txt
    [42]IETF. (2009, August) Media Resource Control Protocol Version 2 (MRCPv2). [Online]. http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-20
    [43]IETF. (2002, June) SIP:Session Initiation Protocol. [Online]. http://tools.ietf.org/html/rfc3261
    [44]IETF. (2003, July) RTP:A Transport Protocol for Real-Time Applications. [Online]. http://tools.ietf.org/html/rfc3550
    [45]ETSI STQ Aurora Working Group. The Impact of Distributed Speech Recognition on multi-modal interfaces to the Mobile Web. [Online]. http://www.w3.org/2000/09/Papers/Aurora.html
    [46]ETSI. Distributed Speech Recognition. [Online]. http://www.etsi.org/WebSite/Technologies/DistributedSpeechRecognition.aspx
    [47]Quentin Loo and Zhiqian Ye, "Assessment of the diagnostic quality of a new medical examination network environment," in International conference on Information Engineering and Computer Science (ICIECS) 2009, Wuhan, China, 2009, pp.3526-3529.
    [48]Java Sun. JMF 2.1.1 Solutions. [Online]. http://java.sun.com/javase/technologies/desktop/media/jmtf/2.1.1/solutions/inde x.html
    [49]trixbox. trixbox quick install guide. [Online]. http://trixbox.org/wiki/trixbox-quick-install-guide
    [50]Fielding et al R, "RFC2616 Hypertext Transfer Protocol-HTTP/1.1," IETF, 1999.
    [51]Engin Bozdag, Ali Mesbah, and Arie van Deursen, "A Comparison of Push and Pull Techniques for AJAX," Delft University of Technology Software Engineering Research Group,2007.
    [52]Norbert Huffschmid. HawHaw. [Online]. http://www.hawhaw.de
    [53]Norbert. (2010, April) HAWHAW Developers group. [Online]. http://groups.vahoo.com/group/hawhaw/message/292
    [54]Norbet. (2010) HAW_raw Class Reference. [Online]. http://www.hawhaw.de/ref/php/html/classHAW_ raw.html
    [55]Opera. (2010) Opera mobile blog. [Online]. http://my.opera.com/operamobile/blog/
    [56]W3C. (2008, July) Authoring applications for the Multimodal Architecture. [Online]. http://www.w3.org/TR/mmi-auth
    [57]"State Control Editor 8.0 user manual," Intervoice Convergys,2010.
    [58](2010) Simple Wrapper Interface Generator (SWIG). [Online]. http://www.swig.org
    [59]Tree. (2005, July 01) NoodleGlue:Bridging C/C++and Java. [Online]. http://www.drdobbs.com/java/184401984?pgno=1
    [60]Q. Xie and D. Pearce. (2005, May) RTP Payloads for ETSI DSR Codecs. [Online].http://www.rfc-editor.org/rfc/rfc4060.txt
    [61]L.B. Larsen, "Assessment of spoken dialogue system usability-what are we really measuring," in EUROSPEECH 2003 Proc., Geneva,2003, pp. 1945-1948.
    [62]Paolo Baggia, "Voice Browser and Multimodal Interaction In 2009," Google TechTalk,2009.
    [63]EMMA 1.0. [Online]. www.w3.org/TR/emma
    [64]Jim Larson, VoiceXML Introduction to Developing Speech Applications.: Prentice-Hall,2002.
    [65]IETF. (2007, July) Distributed Multimodal Synchronization Protocol. [Online]. http://tool s. ietf. org/html/draft-enge lsma-dm sp-04
    [xx][Placeholderl]
    [67]Antonio Coronato and Giuseppe De Pietro, "Middleware Services for Multimodal Interactions in Smart Environments," in Sixth Annual IEEE International Conference on Pervasive Computing and Communications,2008, pp.515-519.
    [68]John. (2008, July) Health Information and the New iPhone. [Online]. http://www.emrandhipaa.com/emr-and-hipaa/2008/07/13/health-information-an d-the-new-iphone/
    [69]David Martin etal. Open Agent Architecture. [Online]. http://www.i.sri.com/～oaa

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700