用于搜索的网页可视化摘要技术研究

英文题名：Research on Visual Summarizations of Web Pages for Search
作者：焦斌星
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：可视化摘要 ; 网页内部图像 ; 网页外部图像 ; 可视化摘要比较 ; 最优可视化摘要
英文关键词：visual summarization ; internal image ; external image ; comparison of
英文关键词：visual summarizations ; best visual summarization
学位年度：2012
导师：吴枫 ; 俞能海
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2012-05-01

摘要

互联网的发展使得搜索引擎成为了用户寻找信息的最主要手段,而准确和迅速是用户对搜索引擎的最主要需求。然而目前搜索引擎的准确度还无法完全满足用户的需求,所以如何能让用户在现有的还不够准确的搜索技术下也能够迅速找到想要的信息成为了一个非常迫切的需求。
     网页中含有很多可视化的多媒体信息,比如图像、动画、视频等等。俗话说“一幅图抵得上千言万语”,搜索引擎在展现搜索结果的时候加上这些多媒体信息,能够让用户在很短的时间内得到更多的信息量,以便于用户迅速找到想要的信息。这些有可能帮助用户搜索的可视化多媒体信息被称为网页的可视化摘要。由于图像是动画和视频的基本组成部分,所以本文对图像作为可视化摘要的关键问题进行了深入的研究。
     网页自身含有的图像是可视化摘要的一个可靠来源,我们称之为网页内部图像。对于这类图像,我们提出重要性模型对其表征网页的能力进行衡量：越重要的图像,越适合作为可视化摘要。然而,也有很多网页不存在重要的内部图像,所以我们提出从互联网中获取与目标网页相关的图像,我们称之为网页外部图像。对于这类图像,我们提出算法对其与目标网页的相关性进行衡量：越相关的图像,越适合作为可视化摘要。另外,我们将这两种基于自然图像的可视化摘要与缩略图等合成图像进行了比较,并以分析结果为出发点,提出了最优可视化摘要的选择算法。本文的主要研究结果有如下几点：
     1.提出了网页内部图像的重要性衡量模型。由于在网页中存在大量的广告图像,装饰图像等,所以我们提出基于图像特征提取和机器学习的算法来衡量图像重要性。该算法从四个层次提取图像特征,并利用基于提升树的LamdaMART算法对图像的重要性建立模型。
     2.提出了网页外部图像的获取和相关性衡量算法。我们提出了基于关键词提取和图像搜索的外部相关图像的获取方法,并基于图像的文字信息与视觉信息衡量其与目标网页的相关性。外部图像获取系统能够为近一半的无重要内部图像的网页找到相关的外部图像,且相关性衡量算法能够达到很高的精度。
     3.对网页内部图像,网页外部图像以及缩略图,Visual Snippet进行了深入的比较。我们利用人工标注的数据比较可视化摘要在不同网页中的效果,比如,重要性得分很高的内部图像是有内部图像的网页的可靠可视化摘要,而缩略图适合作为满足“可视区域较小”,或“在截屏区域内有重要图像”,或“截屏区域内有常见网站的logo"等特点的网页的可视化摘要。另外,我们还通过用户研究分析可视化摘要在理解网页和重新寻找网页这两个应用中的实用性。
     4.提出了从网页内部图像和网页外部图像中选择最优可视化摘要的统一算法。由于网页内部图像和网页外部图像各有其优缺点,所以我们提出了基于聚类的最优可视化摘要选择算法。好的可视化摘要需要满足相关性、重要性和典型性这三个特性,所以该算法利用之前提出的相关性和重要性模型衡量可视化摘要的前两个特性,而利用聚类去体现典型性。我们将相关性和重要性作为聚类的先验知识,采用近邻传播聚类算法将三者有机地结合起来。在聚类完成之后,最好的聚类中心被选为最优可视化摘要。算法在客观和主观评价上都显示了很好的性能。客观评价中,算法的NDCG@1能够达到0.6左右。主观评价中,算法选出的图像被多数用户认同可以用以表征目标网页。
With the rapid development of Internet, search engines have been the major method for users to seek information. Beyond all of the users' needs, accuracy and quickness are the most important ones. However, the accuracy of current search engines cannot fully satisfy the users, so it becomes essential that users can quickly find the needed information with the current search technologies.
     Visual contents, such as the images, animations and videos, are contained in web pages. A picture is worth a thousand words. Information search would become much more efficient if the visual information can be shown in the search result page, since it is easier for users to get a quick understanding by seeing an image than reading texts. These visual contents, which may help users search, are called visual summarizations. Among visual summarizations, the image is the basic component of the animation and video, so we discuss the key technologies of using images as the visual summarizations.
     For a specific web page, the images in this page, which are so-called "internal images", are generally reliable as the visual summarizations. For these images, we proposed a dominance model to measure the ability of them representing the web page. The more dominant the internal images are, the more appropriate they would be to serve as the visual summarizations. However, dominant internal images are unavailable in a lot of web pages, so we proposed a scheme to obtain from the Internet the images relevant to the target web page, which are so-called "external images". Besides, we compared these two natural image based visual summarizations with the synthesized images, such as thumbnails. Based on the comparisons, we further proposed an algorithm to select the best visual summarizations from the internal and external images. The main contents and contributions of this dissertation are as follows:
     1. Proposed a dominance model for internal images. Since advertisement images, decoration images exist in the web pages, we proposed an algorithm to measure the dominance of internal images based on feature extraction and machine learning. The image features were extracted on four levels and LamdaMART algorithm, which is based on boosted tree and optimized for NDCG, was applied in our system to establish the dominance model.
     2. Proposed algorithms to obtain external images and measure the relevance between them and the target web page. Relevant external images were obtained from the Internet based on key phrase extraction and image search, and then the relevance was calculated using textual and visual information of these images. Our system can find relevant external images for almost a half of the web pages without dominant internal images and achieve a high precision.
     3. Performed comparisons between internal images, external images, thumbnails and visual snippets. With a human labeled data set, we analyzed the characteristics of the web pages which were well represented by a specific kind of visual summarization. For example, internal images with high dominance scores are reliable as visual summarizations, and thumbnails are good visual summarizations for those web pages with small page sizes or with dominant images or logos from well-known sites in the snapshot area. Besides, we conducted user studies to compare the visual summarizations in web page understanding and re-finding tasks.
     4. Proposed an algorithm to jointly select the best visual summarization from the internal images and external images. To take the respective advantages of internal images and external images, we proposed a clustering based algorithm to select the best visual summarization. This algorithm leveraged the relevance and dominance as the prior information and exhibited the typicality property using the affinity propagation clustering algorithm. The best exemplar of the clustering algorithm was selected as the best visual summarization. Experimental results have shown that our algorithm can achieve about0.6NDCG@1performance. Our user study also indicated that the images selected by our algorithm were useful as the visual summarizations of web pages.

引文

[1]Bing, http://www.bing.com/[M].2012.
    [2]Google, http://www.google.com/[M].2012.
    [3]Google Images, http://images.google.com/[M].2012.
    [4]Bing Image, http://www.bing.com/images [M].2012.
    [5]Bing Search API, http://www.bing.com/developers [M].2012.
    [6]WORLDWIDEWEBSIZE. http://www.worldwidewebsize.com/[M].2011.
    [7]Viewzi, http://www.viewzi.com/search/webscreenshot [M].2012.
    [8]Flickr, http://www.flickr.com/[M].2012.
    [9]Facebook, http://www.facebook.com/[M].2012.
    [10]Twitter, http://twitter.com/[M].2012.
    [11]Sina Weibo, http://www.weibo.com/[M].2012.
    [12]Safari4, http://www.apple.com/safari [M].2012.
    [13]FastDial, https://addons.mozilla.org/en-us/firefox/addon/5721 [M].2012.
    [14]Ricardo Baeza-Yates.2005. Applications of web query mining [M]. pages 7-22.
    [15]Andy Cockburn, Saul Greenberg, Bruce Mckenzie, Michael Jasonsmith, and Shaun Kaasten.1999. Web View:A graphical aid for revisiting web pages [M]. In OZCHI'99 Australian Conference on Human Computer Interaction, Wagga Wagga, Australia.
    [16]Susan Dziadosz and Raman Chandrasekar.2002. Do thumbnail previews help users make better relevance decisions about web search results? [M]. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'02). ACM, New York, NY, USA, pages 365-366.
    [17]Allison Woodruff, Andrew Faulring, Ruth Rosenholtz, Julie Morrsion, and Peter Pirolli. 2001. Using thumbnails to search the web [M]. In CHI'01:Proceedings of the SIGCHI conference on Human factors in computing systems, New York, NY, USA. ACM. pages 198-205.
    [18]Allison Woodruff, Ruth Rosenholtz, Julie B. Morrison, Andrew Faulring, and Peter Pirolli.2002. A comparison of the use of text summaries, plain thumbnails, and enhanced thumbnails for Web search tasks [J]. J. Am. Soc. Inf. Sci. Technol.53,2 (January 2002), pages 172-185.
    [19]H. P. Luhn.1958. The automatic creation of literature abstracts [J]. IBM Journal of Research and Development,2(2).
    [20]Gerard Salton and Chu-Sing Yang.1973. On the Specification of Term Values in Automatic Indexing [J]. Journal of Documentation,29:pages 351-372.
    [21]Jonathan D. Cohen.1995. Highlights:Language- and domain-independent automatic indexing terms for abstracting [J]. Journal of the American Society for Information Science,46(3):pages 162-174.
    [22]Charles L. A. Clarke, Eugene Agichtein, Susan Dumais, and Ryen W. White.2007. The influence of caption features on clickthrough patterns in web search [M]. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'07). ACM, New York, NY, USA, pages 135-142.
    [23]Zhiwei Li, Shuming Shi, and Lei Zhang.2008. Improving relevance judgment of web search results with image excerpts [M]. In Proceedings of the 17th international conference on World Wide Web (WWW'08). ACM, New York, NY, USA, pages 21-30.
    [24]Jaime Teevan.2009. Visual snippets:Summarizing web pages for search and revisitation [M]. CHI 2009, April 2009.
    [25]Binxing Jiao, Linjun Yang, Jizheng Xu, and Feng Wu.2010. Visual summarization of web pages [M]. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR'10, pages 499-506.
    [26]Binxing Jiao, Linjun Yang, Jizheng Xu, Qi Tian, and Feng Wu.2012. Visually summarizing web pages through internal and external images [J]. Accepted by IEEE Transactions on Multimedia.
    [27]Anne Aula, Rehan M. Khan, Zhiwei Guan, Paul Fontes, and Peter Hong.2010. A comparison of visual and textual page previews in judging the helpfulness of web pages [M].In Proceedings of the 19th international conference on World wide web (WWW'10). ACM, New York, NY, USA, pages 51-60.
    [28]B. S. Manjunath, J. R. Ohm, V. V. Vasudevan, and A. Yamada.2001. Color and texture descriptors [J]. Circuits and Systems for Video Technology, IEEE Transactions on,11(6): pages 703-715.
    [29]Minh N. Do and Martin Vetterli.2002. Wavelet-Based Texture Retrieval Using Generalized Gaussian Density and Kullback-Leibler Distance [J]. Image Processing, IEEE Transactions on,11:pages 146-158.
    [30]Jia Li, James Z. Wang, and Gio Wiederhold.2000. IRM:integrated region matching for image retrieval [M].In MULTIMEDIA'00:Proceedings of the eighth ACM international conference on Multimedia, New York, NY, USA,2000. ACM, pages 147-156.
    [31]Longin Jan Latecki and Rolf Lakamper.2000. Shape Similarity Measure Based on Correspondence of Visual Parts [J]. IEEE Trans. Pattern Anal. Mach. Intell.22,10 (October 2000), pages 1185-1190.
    [32]S. Belongie, J. Malik, and J. Puzicha.2002. Shape Matching and Object Recognition Using Shape Contexts [J]. IEEE Trans. Pattern Anal. Mach. Intell.24,4 (April 2002), pages 509-522.
    [33]Krystian Mikolajczyk and Cordelia Schmid.2004. Scale and affine invariant interest point detectors [J]. International Journal of Computer Vision,60(1), October 2004:pages 63-86.
    [34]Q. Tian and N. Sebe and M. S. Lew.2001. Image Retrieval Using Wavelet-Based Salient Points [J]. J. Electron. Imag.10,4, pages 835-849.
    [35]David W. Hosmer and Stanley Lemeshow.2000. Applied logistic regression (Wiley Series in probability and statistics) [M]. Wiley-Interscience Publication,2 edition.
    [36]MacQueen, J. B.1967. "Some Methods for classification and Analysis of Multivariate Observations". [M] Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, pages 281-297.
    [37]Yu-Fei Ma and Hong-Jiang Zhang.2003. Contrast-based image attention analysis by using fuzzy growing [M]. In Proceedings of the eleventh ACM international conference on Multimedia(MULTIMEDIA'03). ACM, New York, NY, USA, pages 374-381.
    [38]Tie Y. Liu.2007. Learning to rank for information retrieval [M]. Foundations and Trends in Information Retrieval,3(3):225-331, March 2007.
    [39]Corinna Cortes and Vladimir Vapnik.1995. Support-vector networks [M]. Machine Learning,20(3):pages 273-297.
    [40]SVM Light:http://svmlight.joachims.org/[M].
    [41]M. La Cascia, S. Sethi, and S. Sclaroff.1998. Combining Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web [M]. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'98).
    [42]Y. A. Aslandogan and C. T. Yu. Diogenes.2000. A Web search agent for content based indexing of personal images [M]. Proceedings of the ACM SIGIR,2000.
    [43]KHERFI, M. L., ZIOU, D., AND BERNARDI, A.2003. Combining positive and negative examples in relevance feedback for content-based image retrieval [J]. J. Vis. Comm. Image Rep.14,4, pages 428-457.
    [44]John R. Smith and Shih-Fu Chang.1997. VisualSEEk:a fully automated content-based image query system [M]. In Proceedings of the fourth ACM international conference on Multimedia(MULTIMEDIA'96). ACM, New York, NY, USA, pages 87-98.
    [45]Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C. and Nevill-Manning, C.G.1999. KEA: Practical Automatic Keyphrase Extraction [M]. Proceedings of Digital Libraries'99:The Fourth ACM Conference on Digital Libraries, pages 254-255.
    [46]Takuya Maekawa, Takahiro Hara, and Shojiro Nishio.2006. Image classification for mobile web browsing [M]. In Proceedings of the 15th international conference on World Wide Web (WWW'06). ACM, New York, NY, USA, pages 43-52.
    [47]Shaun Kaasten, Saul Greenberg, and Christopher Edwards.2001 How people recognize previously seen web pages from titles, URLs and thumbnails [M].
    [48]Jerome H. Friedman.2000. Greedy function approximation:A gradient boosting machine [J]. Annals of Statistics,29:pages 1189-1232.
    [49]Qiang Wu, Christopher Burges, Krysta Svore, and Jianfeng Gao.2009. Adapting boosting for information retrieval measures [M]. Information Retrieval, September 2009.
    [50]Ruihua Song, Haifeng Liu, Ji-Rong Wen, and Wei-Ying Ma.2004 Learning block importance models for web pages [M]. In Proceedings of the 13th international conference on World Wide Web, WWW'04, New York, NY, USA. ACM, pages 203-211.
    [51]Hanghang Tong, Mingjing Li, Hong-Jiang Zhang, Changshui Zhang, Jingrui He, and Wei-Ying Ma.2005. Learning No-Reference Quality Metric by Examples [M]. In Proceedings of the 11th International Multimedia Modelling Conference (MMM'05).
    [52]Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli.2004 Image quality assessment:from error visibility to structural similarity [J]. Image Processing, IEEE Transactions on,13(4):pages 600-612.
    [53]Hartmann Alexander and Rainer Lienhart.2002. Automatic classification of images on the web [M]. In SPIE, volume 4676.
    [54]Rong Xiao, Long Zhu, and Hong-Jiang Zhang.2003. Boosting chain learning for object detection [M]. In Proceedings, Ninth IEEE International Conference on Computer Vision, volume 2, page 709.
    [55]Qing Yu, Shuming Shi, Zhiwei Li, Ji-Rong Wen, and Wei-Ying Ma.2007. Improve ranking by using image information [M]. In Proceedings of the 29th European conference on IR. research(ECIR'07). Springer-Verlag, Berlin, Heidelberg, pages 645-652.
    [56]Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender.2005. Learning to rank using gradient descent [M]. In Proceedings of the 22nd international conference on Machine learning, ICML'05, New York, NY, USA. ACM, pages 89-96.
    [57]Christopher J. C. Burges, Robert Ragno, and Quoc V. Le.2006. Learning to rank with nonsmooth cost functions [M]. In Bernhard Scholkopf, John C. Platt, Thomas Hoffman, Bernhard Scholkopf, John C. Platt, and Thomas Hoffman, editors, NIPS. MIT Press. pages 193-200.
    [58]Jinlin Chen, Baoyao Zhou, Jin Shi, Hongjiang Zhang, and Qiu Fengwu.2001. Function-based object model towards website adaptation [M]. In Proceedings of the 10th international conference on World Wide Web (WWW'01). ACM, New York, NY, USA, pages 587-596.
    [59]Milos Kovacevic, Michelangelo Diligenti, Marco Gori, and Veljko Milutinovic.2002. Recognition of common areas in a web page using visual information:a possible application in a page classification [M]. In Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM'02, Washington, DC, USA,2002. IEEE Computer Society.
    [60]Deng Cai, Shipeng Yu, Ji-Rong Wen, and Wei-Ying Ma.2003. VIPS:a vision-based page segmentation algorithm [M]. Technical report, Microsoft Research.
    [61]Thorsten Joachims.2002. Optimizing search engines using clickthrough data [M]. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'02). ACM, New York, NY, USA, pages 133-142.
    [62]M. Chen, J.-T. Sun, H.-J. Zeng, and K.-Y. Lam.2005. A practical system of keyphrase extraction for web pages [M]. In CIKM'05:Proceedings of the 14th ACM international conference on Information and knowledge management, New York, NY, USA. ACM. pages 277-278.
    [63]Bruce Croft, Donald Metzler, and Trevor Strohman.2009. Search Engines:Information Retrieval in Practice [J]. Addison Wesley,1 edition.
    [64]Sergey Brin and Lawrence Page.1998 The anatomy of a large-scale hypertextual web search engine [M]. Computer Networks and ISDN Systems,30(1-7):pages 107-117.
    [65]Lowe, D. G.2004. Distinctive Image Features from Scale-Invariant Keypoints [M]. International Journal of Computer Vision,60,2, pages 91-110.
    [66]David Nister and Henrik Stewenius.2006. Scalable Recognition with a Vocabulary Tree [M]. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Volume 2 (CVPR'06), Vol.2. IEEE Computer Society, Washington, DC, USA, pages 2161-2168.
    [67]Yan Ke and R. Sukthankar.2004. PCA-SIFT:a more distinctive representation for local image descriptors [M]. In 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 506-513.
    [68]E. Nowak and F. Jurie. Learning visual similarity measures for comparing never seen objects [M]. Computer Vision and Pattern Recognition,2007. CVPR'07. IEEE Conference on, pages 1-8.
    [69]Philbin, J., Chum, O., Isard, M., Sivic, J. and Zisserman, A.2007. Object Retrieval with Large Vocabularies and Fast Spatial Matching [M]. Computer Vision and Pattern Recognition,2007. CVPR'07. IEEE Conference on.
    [70]Yushi Jing and Shumeet Baluja.2008. VisualRank:Applying PageRank to Large-Scale image search [J]. IEEE Trans. Pattern Anal. Mach. Intell.,30(11):pages 1877-1890.
    [71]Piotr Indyk and Rajeev Motwani.1998. Approximate nearest neighbors:towards removing the curse of dimensionality [M]. In Proceedings of the 30th annual ACM symposium on theory of computing, pages 604-613.
    [72]Eleanor Rosch.1973. On the Internal Structure of Perceptual and Semantic Categories [M]. in Cognitive Development and the Acquisition of Language. Academic, pages 111-144.
    [73]L. W. Barsalou.1987. The instability of graded structure:Implications for the nature of concepts [J]. Cambridge University Press, pages 101-140.
    [74]B. J. Frey and D. Dueck,2007. Clustering by passing messages between data points [J]. Science, vol.315, pp.972-976.
    [75]Zheng J. Zha, Linjun Yang, Tao Mei, Meng Wang, and Zengfu Wang.2009. Visual query suggestion [M]. In Proceedings of the 17th ACM international conference on Multimedia, MM'09, New York, NY, USA. ACM. pages 15-24.
    [76]Rui Liu, Linjun Yang, and Xian-Sheng Hua.2009. Image search result summarization with informative priors [M]. In Proceedings of the 9th Asian conference on Computer Vision-Volume Part III (ACCV'09), Vol. Part Ⅲ. Springer-Verlag, Berlin, Heidelberg, pages 485-495.
    [77]Tao Mei, Xian-Sheng Hua, Wei Lai, Linjun Yang, Zheng-Jun Zha, Yuan Liu, Zhiwei Gu, Guo-Jun Qi, Meng Wang, Jinhui Tang, Xun Yuan, Zheng Lu, and Jingjing Liu.2007. Msra-ustc-sjtu at TRECVid 2007:High-level feature extraction and search [M]. In Proceedings of the 7th TRECVid Workshop, Gaithersburg, USA.
    [78]Stephen E. Robertson, Steve Walker, and Micheline Hancock-Beaulieu.1998. Okapi at TREC-7 [M]. In Proceedings of the Seventh Text REtrieval Conference. Gaithersburg, USA.
    [79]D. Coomans; D.L. Massart.1982. Alternative k-nearest neighbour rules in supervised pattern recognition:Part 1. k-Nearest neighbour classification by using alternative voting rules [J]. Analytica Chimica Acta 136:pages 15-27.
    [80]Rosenblatt, Frank.1958. The Perceptron:A Probabilistic Model for Information Storage and Organization in the Brain [M]. Cornell Aeronautical Laboratory, Psychological Review, v65, No.6, pages 386-408.
    [81]Y. Yuan and M.J. Shaw.1995. Induction of fuzzy decision trees. Fuzzy Sets and Systems 69 [M]. Pages 125-139
    [82]Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome.2009. The Elements of Statistical Learning (2nd ed.) [J]. New York:Springer. pages 520-528.
    [83]P. Deuflhard.2004. Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms [M]. Springer Series in Computational Mathematics, Vol.35. Springer, Berlin.
    [84]李航.2012.统计学习方法[J].清华大学出版社.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700