面向资源共享网站的图像标注和标签推荐技术研究

英文题名：Research of Image Annotation and Tag Recommendation on Shared Resource Websites
作者：陈烨
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：自动图像标注 ; 社群 ; 潜在主题挖掘 ; 隐Dirichlet分配模型 ; 多社群信息融合 ; 标签推荐 ; 随机游走
英文关键词：Automatic image annotation ; Social group ; mining of latent topic ; Latent Dirichlet Allocation ; Multi-Group information fusion ; Tag Recommendation ; Random Walk
学位年度：2010
导师：庄越挺 ; 吴飞
学科代码：081203
学位授予单位：浙江大学
论文提交日期：2009-01-25

摘要

随着网络多媒体技术的快速发展,互联网上的图像等多媒体内容的数量正在以指数级的速度迅猛增长。因此,实现大规模互联网图像的有效管理和检索具有十分重要的现实意义。由于大多数互联网图像标签丢失或标签存在大量噪音,因此研究对这些弱标注(weakly-tagged)互联网图像自动添加标签成为当前热点研究问题。
     本文首先针对Flickr用户经常会根据上传图像所隐含主题而将其推荐到多个相关社群这一特点,提出了基于社群隐含主题挖掘和多社群信息融合的自动图像标注算法。该算法采用隐Dirichlet分配模型对单个社群中隐含主题进行挖掘,根据候选标签与社群隐含主题之间相关性,对初始候选标注标签进行过滤和排序,最终通过多层次多社群主题信息融合,得到标注结果。对于从Flicrk网站下载的三个社群图像进行实验的结果表明该算法能很大程度提高自动图像标注精度。
     同时,为了有效辅助用户添加标签,本文提出了结合社群文本、图像和用户上下文信息的个性化标签推荐算法。该算法首先建立由用户、图像、标签组成的三元矩阵,然后在社群中寻找与待标签推荐用户兴趣相近用户以及与待标签推荐图像相似的图像得到用户的个人偏好,最后利用随机游走机制对标签进行排序,得到推荐结果,该方法具有通用性,即只要给出用户、图像和标签中任何一个元素,均可得到标签推荐结果。
Nowadays, the number of internet images is growing at an exponential rate. Therefore, how to effectively manage and retrieve large scale Internet images put forth a great challenge. Since a great number of images uploaded onto Internet do not have any labels, or has limited labels with noise, automatic annotation of such "weakly-tagged" Internet images has been a hot topic recently.
     Since users intend to recommend images to multiple social groups according to semantics of images when they upload images into Flicker, this paper proposes a two-stage approach to automatically annotate weakly-tagged social images. The first stage discovers the latent topics in each group by Latent Dirichlet Allocation(LDA) model, and filters out noisy tags in group level in order to re-rank topic-relevant tags. The second stage discovers the hierarchical topic structure among multiple groups by WordNet, and hierarchically fuses the candidate tags from multiple groups.
     This paper also proposes an approach to integrate social text, image and user context for tag recommendation. This approach sets up a ternary matrix to represent the relationship among users, images and tags at first; and get a personal preference by discovering users with similar interest and images with similar visual similarity at second, and finally utilizes random walk to recommend tags for unlabeled images. This tag recommend approach is very flexible, since we can get recommendation result once any one of information about a user, an image, or a tag is offered.

引文

[1] Flickr, http://www.flickr.com[OL]

    [2] B.E.Prasad, A.Gupta, H.M.D.Toong, S.E.Madnick. A microcomputer-based image database management system[J], IEEE Transactions on Industrial Electronics. 1987, 34(1): 83-88
    [3] S.Blott, R.Weber. What's wrong with high-dimensional similarity search[C]. Proceedings of the VLDB, 2008, 1(1): 3-3
    [4] A.Hughes, T.Wilkens, B.Wildemuth, G.Marchionini. Text or pictures? an eyetracking study of how people view digital video surrogates[C]. Proceedings of the International Conference on Image and Video Retrieval, 2003:271-280
    [5] J.Z.Wang, D.Geman, J.B.Luo, R.M.Gray. Real-world image annotation and retrieval: An introduction to the special section[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11 ):1873-1876
    [6] B.Sigurbjornsson, R.Zwol. Flickr tag recommendation based on collective knowIedge[C]. International Conference of World Wide Web, 2008:327-336
    [7] D.Blei, A.Ng, M.Jordan. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
    [8] T.Griffiths, M.Steyvers. Finding scientific topics[C]. Proceedings of the National Academy of Science, 2004, 101:5228-5235
    [9] H.Cheng, P.N.Tan, J.Sticklen, W.F.Punch. Recommendation via Query Centered Random Walk on K-partite Graph[C]. IEEE International Conference on Data Mining, 2007:457-462
    [10] Y.Jin, L.Wang, K.L.Han. Improving image annotations using WordNet[C]. International Workshop on Multimedia Information Systems, 2005:115-130

    [11] G.A.Miller, C.Fellbaum, R.Tengi, H.Langone. Wordnet: a lexical database for the English language[M]. Cognition Science Lab, Princeton University, 1995

    [12] R.Cilibrasi, P.M.B.Vitanyi. The google similarity distance[J]. IEEE Transactions on Knowledge and Data Engineering, 2007
    [13] L.Wu , X.S.Hua, N.H.Yu, W.Y.Ma, S.P.Li. Flickr distance[C]. Proceeding of the 16th ACM international conference on Multimedia, 2008
    [14] L.Wu, M.J.Li, N.H.Yu, X.S.Hua. Scale-invariant visual language modeling for object categorization[J]. IEEE Transactions On Multimedia, 2009, 11(2):286-294
    [15] D.Lowe. Towards a computational model for object recognition in IT cortex[C]. Biologically Motivated Computer Vision, 2000:20-31
    [16] M.Jamieson, A.Fazly, S.Stevenson, S.Dickinson, S.Wachsmuth. Using language to learn structured appearance models for image annotation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, Nov, 2008
    [17] J.Yang, C.W.Ngo, A.Hauptmann, Y.G.Jiang. Evaluating bag-of-visual-words representations in scene classification[C]. ACM Multimedia Information Retrieval Workshop. 2007
    [18] J.S.Yuan, Y.Wu, M.Yang. Discovery of collocation patterns: from visual words to visual phrases[C]. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2007:1-8
    [19] S.Lazebnik, C.Schmid, J.Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories[C]. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006,2:2169-2178
    [20] N.O'Hare, A.F.Smeaton. Context-aware person identification in personal photo collections[C]. IEEE Transactions On Multimedia, 2009, 11(2):220-228
    [21] J.Liu, B.Wang, M.Li, Z.Li, W.-Y.Ma, H.Lu, S.Ma. Dual cross-media relevance model for image annotation[C]. In Proceedings of the 15th international conference on Multimedia, 2007:605-614
    [22] X.G.Rui, M.J.Li, Z.W.Li, W.Y.Ma, N.H.Yu. Bipartite graph reinforcement model for web image annotation[C]. Proceedings of the 15th international conference on Multimedia, September, 2007

    [23] Collaborative filtering, http://en.wikipedia.org/wiki/Collaborative_filtering[OL]

    [24] del.icio.us. http://delicious.com/[OL]
    [25] CiteULike. http://www.citeulike.org[OL]
    [26] Connotea. http://www.connotea.org[OL]

    [27] Last.fm. http://www.last.fm[OL]

    [28] BibSonomy. http://www.bibsonomy.org[OL]

    [29] D.Benz, K.Tso, L.Schmidt-Thieme. Automatic bookmark classification: A collaborative approach[C]. Proceedings of the Second Workshop on Innovations in Web Infrastructure, 2006
    [30] J.M.Kleinberg. Authoritative sources in a hyperlinked environment[J]. Journal of the ACM, 1999, 46(5):604-632
    [31] G.Mishne. Autotag: a collaborative approach to automated tag assignment for weblog posts[C]. Proceedings of the 15th international conference on World Wide Web, 2006:953-954
    [32] M.Deshpande, G.Karypis. Item-based top-n recommendation algorithms[J]. ACM Transactions on Information Systems. 2004, 22(1): 143-177
    [33] M.Lipczak. Tag recommendation for folksonomies oriented towards individual users[C]. ECML PKDD Discovery Challenge, 2008
    [34] R.Jaschke, L.Marinho, A.Hotho, L.Schmidt-Thieme, G.Stumme[J]. Tag recomm- enddations in social bookmarking systems. AI Communications, 2008, 21(4):231-247.

    [35] PageRank. http://en.wikipedia.org/wiki/PageRank[OL]
    [36] A.Hotho, R.Jaschke, C.Schmit, G.Stumme. Information retrieval in folksonomies: Search and ranking[M]. The Semantic Web: Research and Applications. 2006, 4011:411-426
    [37] Z.Xu, Y.Fu, J.Mao, and D.Su. Towards the semantic web: Collaborative tag suggestions[C]. In Proceedings of the Collaborative Web Tagging Workshop at 15th international conference on World Wide Web, 2006
    [38] P.Mika. Ontologies are us: a unified model of social networks and semantics[C]. In Proceeding of the 4th International Semantic Web Conference, 2005: 5-15
    [39] M.Sanderson, B.Croft. Deriving concept hierarchies from text[C]. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999:206-213
    [40] P.Schmitz. Inducing ontology from Flickr tags[C]. Proceedings of the Collaborative Web Tagging Workshop at 15th international conference on World Wide Web, 2006
    [41] N.Garg, I.Weber. Personalized Interactive Tag Recommendation for Flickr[C]. Proceedings of the 2008 ACM conference on Recommender systems, 2008: 67-74
    [42] K.Jarvelin, J.Kekalainen. Cumulated Gain-Based Evaluation of IR Techniques[J]. ACM Transactions on Information System, 2002, 20(4):422-466

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700