针对移动设备的跨媒体网络信息检索及自适应信息显示研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在信息时代,全世界有大量的数字化内容以多媒体(文本、图形、图像、视频、音频等)的形式生成、存储、传播和转换,并呈爆炸性增长。特别是互联网的迅速发展,更加推动了信息服务的多样化发展和多媒体数据的急剧膨胀。当前绝大多数网络信息服务和内容都是为传统的台式机设计的,采用固定单一的发布方式,向所有客户提供完全相同的内容,并没有考虑客户端主观和客观环境的差异。因此,多数情况下,它们都不适合那些屏幕、存储、计算和网络连接都有限的移动设备,如个人数字助理、手机等。
     目前,这些新兴的移动设备已经被被广泛使用。据统计,截止2006年底,中国的手机用户超过了三亿。借助互联网的威力,这些设备给用户带来了前所未有和无所不在的信息体验。移动搜索作为互联网搜索技术与移动通信技术相结合的产物,打破了利用台式机机进行搜索获取信息的终端局限性,使得移动设备用户可以随时随地地获取所需的信息。
     但在移动设备上,由于种种原因,信息并不象在台式计算机上那样易于访问。现有的信息获取方法并不能很好地适用于移动设备。移动搜索绝对不是将互联网搜索照搬到手机上来。比起互联网搜索,移动搜索对于搜索的准确性、搜索结果的提示信息、满足个性化搜索需求等方面,有更高的要求。移动搜索引擎需要考虑分析用户的使用习惯,具有一定的智能性和便利的用户交互方式。另外,通过移动设备来方便地获取信息所面临的另一大难题是如何在低带宽的网络连接和小面积的显示屏幕的限制之下浏览信息。在这样的背景下,如何给移动设备用户提供有效便捷的信息检索方式和适合不同设备的自适应多媒体结构化内容就逐渐成为人们研究的重点。
     本论文重点讨论利用跨媒体检索的技术,满足移动设备用户的多媒体信息检索需求,并改善移动搜索的体验。另外,通过智能化的多媒体内容分析和建立多媒体内容表示模型,实现自适应的多媒体信息显示,给移动设备用户提供访问多媒体信息的易用的交互方式。论文主要研究内容与创新成果如下:
     1.介绍了移动搜索产生的背景、技术动力和市场前景;总结了移动搜索和网络多媒体内容自适应领域相关的工作;探讨了当前移动搜索和网络多媒体自适应研究包含的内容以及所面临的问题。
     2.从移动互联网的发展和现状的介绍入手,根据目前正在进行的相关研究和相关应用,较全面地总结了目前移动搜索技术的发展现状。对移动搜索的特征、技术等作了详细的梳理和归纳,并对移动搜索的未来可能的研究方向进行了探讨。
     3.阐述了跨媒体检索的概念和相关技术;讨论了跨媒体检索的概念及其和单一媒体检索的比较;分析了跨媒体检索中常采用的技术和发展现状。
     4.通过一个具体的实例Photo2Search移动搜索系统来说明如何把跨媒体检索的技术运用到移动搜索中。提出了一套在大数据量下进行可视化搜索的算法和针对有内置摄像头的移动设备的移动搜索方案,支持用户进行多模态的查询,实现了移动设备上直观的便捷的可视化搜索。在海量真实网络数据集上的实验结果表明,所提出的解决方案和算法是高效实用的。
     5.充分利用电视节目中含有多模态信息的特点,提出并实现了一种网络化的电视节目浏览方案。使用多模态的信息对电视节目进行结构化,并利用跨媒体检索的技术获取和节目相关的网络内容和生成超链接。提供一种集成的可自适应内容缩放的网络化浏览界面,可以满足不同设备访问的需要。
     6.针对多样化的移动设备获取所需网络多媒体信息的问题,提出了一种有扩展性自适应的图像的视觉注意力内容模型。在这个模型的基础上提出了几种图像自适应显示的方案,并结合基于H.264的空间转码技术对视频的自适应显示加以研究。
In the current information age, a large amount of digital contents are generated, stored, spreading and transformed in the form of multimedia (text, graphic/image, video/audio etc.) and the increase is explosive. Especially the galloping development of Internet promotes the endless emergence of application service and the rapid expansion of multimedia data. The present multimedia information service and content are mostly specifically designed for desktop PCs. In many cases, they are not suitable for devices like Personal Digital Assistants and Smartphones with relatively limited display capability, storage, processing power, and network access.
     Nowadays, the rising mobile devices are more and more popular in our lives. With the capability to access to Internet, these devices provide users a brand new experience for pervasive information. As a combination of Internet search and mobile communication technologies, mobile search overcomes the limitation that the information can only be retrieved and browsed by a desktop PC. With mobile search technology, users can freely acquire the needed information on-the-go.
     However, the information access on mobile devices is still not as easy as desktop PCs due to some difficulties. The current information retrieval methods are not equally applied on mobile devices. Mobile search is not a simple duplicate of Internet search on mobiles. Further considerations are required such as accuracy of retrieved results, assistant information for displayed entries and personalized information needs. Usability is also a necessary consideration for a mobile search engine, which should be with an intelligent and friendly user interface.
     In addition, the other difficulty for the information acquisition on mobiles is how to browse the retrieved information under the low band-width connection and small display. Currently, most of Internet information services publish the media data in a unique form -totally same content for the different users. The objective and context differences among variant terminals are still not taken into account. Therefore, more and more research attention has been drawn on the topics how to make mobile users obtain the Web information in an effective and easy way and how to make the multimedia content structuralized and more adaptive for different devices.
     The dissertation focuses on facilitating the multimedia information search on mobiles and improving the user experience of mobile search. In addition, we research on the intelligent multimedia content analysis and modeling to achieve adaptive multimedia display and to provide mobile users a friendly interface to access multimedia data. The main content and innovations in this dissertation are as follows: 1. The background, motivations and promising prospect of mobile search are introduced. Researches on mobile search and multimedia content adaptation are also reviewed. We also discuss the main topics and problems on mobile search and multimedia content adaptation.
    2. We introduce the development and status of mobile Web. The literature of mobile search technology is reviewed according to related ongoing researches. The characteristics, techniques of mobile search are specified and the possible breakthrough is discussed.
    3. We elaborate on the concept and related techniques of cross-media information retrieval, which is compared with the information retrieval for the single kind of media. The common technical solutions cross-media information retrieval in are also discussed.
    4. We illustrate the mobile search by the cross-media information retrieval techniques in an example - Photo2search mobile search system. A set of visual search algorithms in large-scale data are proposed in a mobile search scheme for the devices with the embedded camera. The system supports multimodal queries and provides an easy way to carry out visual search on mobiles. The experimental results in large-scale Web data show the effectiveness and efficiency.
    5. Considering the advantage of multimodal information in the TV program, we propose and implement a Web-like browsing scheme for TV programs. Multimodal information is used to structurize the TV program and to produce the hyperlinks to Web content. An integrated Web-like display interface is provided in an adaptive and zoomable manner, which can be applied to different display sizes.
    6. A scalable and adaptive image attention model is proposed for heterogeneous devices and universal access. Based on this model, several solutions of adaptive image display are designed. Moreover, we combine the image attention model with video transcoding based on H.264 encoding standard to carry out an effective video adaptation scheme for mobile devices.
引文
[1] "IBM WebSphere Transcoding Publisher," http://www-4.ibm.com/software/webservers/transcoding/.
    [2] "OnLineAnywhere," http://www.onlineanywhere.com/.
    [3] "SpyGlass Prism," http://www.opentv.com/.
    [4] T.W. BickMore and B. N. Schilit, "Digestor: Device Independent Access to the World Wide Web," in Proceedings of the 6th International World Wide Web Conference, pp. 655-663, Santa Clara, USA, April 1997.
    [5] W.W.W. Consortium, "Web Content Accessibility Guidelines 1.0," http:/www.w3.org/tr/wai-webcontent/.
    [6] I. Corporation, "QuickWeb homepage," http://www.intel.com/quickweb/.
    [7] A. Fox, G. Steve, C. Yatin, and B. Eric, "The Transend Service," http://transend.cs.berkeley.edu/about/.
    [8] R. Han, P. Bhagwat, R. Lamaire, T. Mummert, et al., "Dynamic Adaptation in an Image Transcoding Proxy for Mobile Web Access," IEEE Personal Communications, vol. 5, no. 6, pp. 8-17.
    [9] M. Jones, G. Marsden, N. Mohd-Nasir, K. Boone, et al., "Improving Web Interaction on Small Displays," in Proceedings of the 8th International World Wide Web Conference, Toronto, Canada, May 1999.
    [10] C. S. Li, R. Mohan, and J. R. Smith, "Multimedia Content Description in the InfoPyramid," in Proceedings of ICASSP'98, Special Session on Signal Processing in Modern Multimedia Standards, May 1998.
    [11] M. Liijeberg, H. Helin, M. Kojo, and K. E. E. Raatikainen, "Enhanced Services for World Wide Web in Mobile WAN Environment," Department of Computer Science, University of Helsinki, Report C-1996-28, April 1996.
    [12] W. Y. Ma, I. Bedner, G. Chang, A. Kuehinsky, et al., "A Framework for Adaptive Content Delivery in Heterogeneous Network Environments," in Proceedings of SPIE Vol. 3969 (Multimedia Computing and Networking 2000), pp. 86-100, San Jose, USA, January 2000.
    [13] R. Mohan, J. R. Smith, and C. S. Li, "Adapting Multimedia Internet Content for Universal Access," IEEE Transactions on Multimedia, vol. 1, no. 1, pp. 104-114, March 1999.
    [14] S. 1. Technologies, "Fastlane," http://www.spectruminfo.com/.
    [1] K. Church, M.T. Keane, and B. Smyth, "Towards More Intelligent Mobile Search," Proceedings of the 19th International Joint Conference on Artificial Intelligence, pp. 1675-1676, Edinburgh, UK, August 2005.
    [2] L.D. Paulson, "Search Technology Goes Mobile," IEEE Computer, vol. 38, no. 8, pp. 19-22, August 2005.
    [3] V. Roto, "Search on Mobile Phones," Journal of the American Society for Information Science and Technology, vol. 57, no. 6, pp. 834-837, 2006.
    [4] "Introduction to Mobile Search," Technical report, Mobile Marketing Association, January 2006.
    [5] "Mobile Search and Its Implications for Search Engine Marketing," Technical report, Oneupweb Company, November 2005.
    [6] "Mobile Search Engines White Paper," Technical report, Sonera MediaLab, November 2002.
    [1] T. Burger, G. Gunther, and E. Gams, "The Role of MPEG-7 in semantic Annotation and the Cross-Media Publishing Process," in Proceedings of the Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution, pp. 71-76, Leeds, UK, December 2006.
    [2] I. Cox, M. Miller, T. Minka, T. Papathomas, et al., "The Bayesian Image Retrieval System PicHunter: Theory, Implementationi and Psychophysical Experiments," IEEE Transactionis on Image Processing, vol. 9, no. 1, pp. 20-37, 2000.
    [3] S. Deerwester, S. Dumais, G. W. Furnas, T. K. Landauer, et al., "Indexing by Latent Semantic Analysis," Journal of the Society for Information Science, vol. 41, no. 6, pp. 391-407, 1990.
    [4] C. Dorai and S. Venkatesh, "Computational Media Aesthetics: Finding Meaning Beautiful," IEEE Multimedia, The Media Impact Column, vol. 8, no. 4, pp. 10-12, 2001.
    [5] R. Duygulu, K. Barnard, N. d. Freitas, and D. Forsyth, "Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary," in Proceedings of 7th European Conference on Computer Vision, pp. 97-112, 2002.
    [6] X.F. He, W. Y. Ma, and H. J. Zhang, "Leaming an Image Manifold for Retrieval," in ACM International Conference on Multimedia 2004, pp. 17-23, New York, USA, 2004.
    [7] W. Hsu, S.-F. Chang, C.-W. Huang, L. Kennedy, et al., "Discovery and Fusion of Salient Mullti-modal Features towards News Story Segmentation," in Proceedings of SPIE Storage and Retrieval of Image and Video Database, vol. 5307, pp. 244-258, San Jose, USA, January 2004.
    [8] J. C. Huang, Z. Liu, and Y. Wang, "Joint Video Scene Segmenitation and Classification Based on Hidden Markov Model," in IEEE International Conference on Multimedia and Expo 2000, pp. 1551-1554, New York, USA, 2000.
    [9] J. Jeon, V. Lavrenko, and R. Manmatha, "Automatic Image Anunotation anid Relevance Using Cross-Media Relevance Models," in Proceedings of International ACM SIGIR Conference on Research and Development in Informaion Retrieval 2003, pp. 119-126, Toronto, Canada, 2003.
    [10] L. Page, S. Brin, R. Motwani, and T. Winograd, "The PageRank Citation Ranking: Bringing Order to the Web," Technical report, Stanford University, 1999.
    [11] A. Pentland, R. W. Picard, and S. Sclaroff, "Photobook: Content-based Manipulation of Image Databases," International Journal of Computer vision, vol. 18, no. 3, pp. 233-254, 1996.
    [12] Y. Rui, T. S. Huang, and S. F. Chang, "Image Retrieval: Current Techlniques, Promisinig Directions and Open Issues," Journal of Vsual Comnmunication and Image Representation, vol. 10, no. 4, pp. 39-62, 1999.
    [13] H. T. Shen, B. C. Ooi, and K. L. Tan, "Giving Meanings to WWW Images," in Proceedings of the 8th ACM International Conference on Multimedia, pp. 39-48, Los Angeles, USA, October 2000.
    [14] S. Tong and E. Chang, "Support Vector Machine Active Learning for Image Retrieval," in ACM International Conference on Multimedia 2001, pp. 107-119, Ottawa. Canada, 2001.
    [15] Y. Wu, E. Y. Chang, K. C. C. Chang, and J. R. Smith, "Optimal Multimodal Fusion for Multimedia Data Analysis," in Proceedings of ACM International Conference on Multimedia 2004, pp. 572-579, New York, USA, 2004.
    [1] "Camera Phones to Get 99% of Local Market in 2005," http://times.hankooki.com/1page/tech/200411/kt2004112217342011800.htm.
    [2] "Google Mobile Search," http://www.google.com/xhtml.
    [3] "Google SMS," http://www.google.com/sms/.
    [4] "A Homepage About QR Code," http://www.qrcode.com/.
    [5] "Yahoo! Mobile," http://mobile.yahoo.com.
    [6] V. Athitsos, J. Alon, S. Sclaroff, and G. Koilios, "Boostmap: A Method for Efficient Approximate Similarity Rankings," in Proceedings of the 2004 IEEE Conference on Computer Vision and Pattern Recognition. vol. 2, pp. 268-275, Washington DC, USA, July 2004.
    [7] A. Baumberg, "Reliable Feature Matching across Widely Separated Views," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2000. vol. 1, pp. 774-781, Hilton Head Island, USA, Jun. 2000.
    [8] R. Bodner and F. Song, "Knowledge-Based Approaches to Query Expansion in Information Retrieval," in Advances in Artificial Intelligence: Springer, pp. 146-158, 1996.
    [9] E. Chang, C. Li, J. Wang, P. Mork, and G. Wiederhold, "Searching near-Replicas of Images Via Clustering," in Proceedings of SPIE Multimedia Storage and Archiving System Ⅵ, vol. 3846, pp. 251-292, Boston, USA, Sep. 1999.
    [10] Z. Chen, W. Liu, C. Hu, M. Li, and H.-J. Zhang, "iFind: A Web Image Search Engine," in Proceedings of the 24th ACM SIGIR conference on Research and development in information retrieval (SIGIR2001), pp. 450, New Orleans, USA, Sep. 2001.
    [11] J. S. Hare and P. H. Lewis, "Content-Based Image Retrieval Using a Mobile Device as a Novel Interface," in Proceedings of SPIE Storage and Retrieval Methods and Applications for Multimedia 2005. vol. 5682, pp. 64-75, San Jose, USA, Jan. 2005.
    [12] V. J. Hodge and J. Austin, "Hierarchical Growing Cell Structures: Treegcs," IEEE Transactions on Knowledge and Data Engineering, vol. 13, pp. 207-218,2001.
    [13] Y. Ke, S. Rahu, and L. Huston, "An Efficient Parts-Based near-Duplicate and Sub-linage Retrieval System," in Proceedings of the 12th annual ACM international conference on Multimedia, ACM Press, pp. 869-876, New York, NY, USA, October 2004.
    [14] S. Lazebnik, C. Schmid, and J. Ponce, "Semi-Local Affine Parts for Object Recognition," in British Machine Vision Conference. vol. 2, pp. 779-788, London, UK, Sep. 2004.
    [15] D. G. Lowe, "Distinctive Image Features from Scale-lnvariant Keypoints," International Journal of Computer Vision, vol. 60, pp. 91-110, 2004.
    [16] K. Mikolajczyk and C. Schmid, "A Performance Evaluation of Local Descriptors," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 27, pp. 1615-1630, 2005.
    [17] K. Mikolajczyk and C. Schmid, "Scale and Affine Invariant Interest Point Detectors," International Journal of Computer Vision, vol. 60, pp. 63-86, 2004.
    [18] K. Mikolajezyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, "A Comparison of Affine Region Detectors," International Journal of Computer Vision, to appear.
    [19] G. Miller, "Wordnet: A Lexical Database," Communication of the ACM, vol. 38, pp. 39-41, 1995.
    [20] M. Noda, H. Sonobe, S. Takagi, and F. Yoshimoto, "Cosmos: Convenient Image Retrieval System of Flowers for Mobile Computing Situations," in Proeedings of the IASTED Conf. on Information Systems and Databases 2002, pp. 25-30, Tokyo, Japan, Sep. 2002.
    [21] M. F. Porter, "An Algorithm for Suffix Stripping," Program, vol. 14, pp. 130-137, 1980.
    [22] F. Schaffalitzky and A. Zisserman, "Multi-View Matching for Unordered Image Sets," in Proceedings of the 7th European Conference on Computer Vision. vol. 1, pp. 414-431, Copenhagen, Denmark, May 2002.
    [23] C. Sehmid and R. Mohr, "Local Grayvalue Invariants for Image Retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 530-535, 1997.
    [24] N. Sebe, Q. Tian, E. Loupias, M. Lew, and T. Huang, "Evaluation of Salient Point Techniques," Image Vision Computing, vol. 17, pp. 1087-1095, 2003.
    [25] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," in Proceedings of Ninth IEEE International Conference on Computer Vision, vol. 2, pp. 1470-1477, Nice, France, October 2003.
    [26] H. Sonobe, S. Takagi, and F. Yoshimoto, "Image Retrieval System of Fishes Using a Mobile Device," in Proceedings of International Workshop on Advanced Image Technology 2004Singapore, Jan. 2004, pp. 33-37.
    [27] E. M. Voorhees, "Query Expansion Using Lexieal-Semantic Relations," in Proceedings of the 17th ACM SIGIR conference on Research and development in information retrieval, pp. 61-69, Dublin, Ireland, 1994.
    [28] T. Yeh, K. Tollmar, and T. Darrell, "Searching the Web with Mobile Images for Location Recognition," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2004, vol. 2, pp. 76-81 Washington D.C., USA, Jun. 2004.
    [29] T. Yeh, K. Tollmar, K. Grauman, and T. Darrell, "A Picture Is Worth a Thousand Keywords: Image-Based Object Search on a Mobile Platform," in Proceedings of the 2005 Conference on Human Factors in Computing Systems, pp. 2025-2028, Portland, USA, Apr. 2005.
    [1] P. Baudisch and L. Brueckner, "TV Scout: Lowering the entry barrier to personalized TV program recommendation," in Proceedings of the 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems, pp. 58-67, Malaga, Spain, May 2002.
    [2] P. Cotter and B. Smyth, "PTV: Intelligent Personalised TV Guides," in Proceedings of the 12th Innovative Applications of Artificial Intelligence (IAAI-2000) Conference, pp. 957-964, Austin, USA, August 2000.
    [3] M.A. Fischler and R. C. Bolles, "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography," Communications of the ACM, vol. 24, pp. 381-395, 1981.
    [4] D. G. Lowe, "Distinctive Image Features from Scale-lnvariant Keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
    [5] Q. Ma, A. Nadamoto, and K. Tanaka, "Complementary information retrieval for cross-media news content," in Proceedings of the 2nd ACM international workshop on Multimedia databases, pp. 45-54, Washington, DC, USA, November 2004.
    [6] H. Miyamori and K. Tanaka, "Webified video: media conversion from TV program to web content and their integrated viewing method," in Special interest tracks and posters of the 14th international conference on Worm Wide Web, pp. 946-947, Chiba, Japan, May 2005.
    [7] K. Sumiya, M. Munisamy, and K. Tanaka, "Tv2web: generating and browsing web with multiple iod from video streams and their metadata," in Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pp. 398-399, New York, NY, USA, May 2004.
    [8] M. Utiyama and H. Isahara, "A statistical model for domain-independent text segmentation," in Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 491-498, Toulouse, France, July 2001.
    [9] H.J. Zhang, A. Kankanhalli, and S. Smoliar, "Automatic Partitioning of Full-motion Video," ACM Multimedia Systems Journal, vol. 1, no. I, pp. 10-28, 1993.
    [1] "JVT reference software official version," Image Processing Homepage, http://bs.hhi.de/-suehring/tml/.
    [2] X. Fan, X. Xie, H.-Q. Zhou, and W.-Y. Ma, "Looking into video frames on small displays," in Proceedings of the eleventh ACM international conference on Multimedia, pp. 247-250, Berkeley, CA, USA, November 2003.
    [3] L. Itti and C. Koch, "A Comparison of Feature Combination Strategies for Saliency-Based Visual Attention System," in Proceedings of SPIE Vol. 3644 (Human Vision and Electronic Imaging Ⅳ), pp. 473-482, San Jose, USA, January 1999.
    [4] L. Itti and C. Koch, "Computational modeling of visual attention," Nature Reviews Neuroscience, vol. 2, no. 3, pp. 194-203, March 2001.
    [5] L. Itti, C. Koch, and E. Niebur, "A Model of Saliency-Based Visual Attention for Rapid Scene Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, November 1998.
    [6] R. Jain, "Mobile multimedia," IEEE Multimedia, vol. 8, no. 3, pp. 1-1, 2001.
    [7] H. Liu, X. Xie, W. Y. Ma, and H.-J. Zhang, " Automatic browsing of large pictures on mobile devices," in Proceedings of the 11th ACM International Conference on Multimedia, Berkeley, CA, USA, November 2003.
    [8] Y. F. Ma, L. Lu, H.-J. Zhang, and M. Li, "A User Attention Model for Video Summarization," in Proceedings of the 10th ACM International Conference on Multimedia, pp. 533-542, Juan-les-Pins, France, December 2002.
    [9] P. Pirolli and S. K. Card, "Information foraging," Psychological Review, vol. 106, no. 4, pp. 643-675, 1999.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700