基于PLSA模型非法图片过滤技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着21世纪信息技术及互联网技术的飞速发展,网络信息已成为人们获取信息的主要来源,而在人们所获取的信息当中图片信息占很大一部分,随着网络信息的发展,图片信息的数量还将不断增加。而在这些图片信息当中还存在大量的不健康的图片,给人们的生活带来周多不便,特别对青少年的健康造成了极坏的影响。为了解决不良图片在网络上传播的问题,学者们者提出了基于图片内容图片过滤技术,在基于图片内容的图片过滤时,把握图片的特征以及使用一个好的过滤策略成为了过滤成功的关键。基于这两个研究点,本文提出基于SIFT图片特征提取算法和基于PLSA特征匹配算法相结合的图片过滤模型。
     本文之所以选择PLSA作为模型主题算法,是由于敏感图像过滤问题是一个小样本问题,而概率潜在语义分析(PLSA)的模型对于解决小样本、非线性和高维模式识别与分类问题具有很大的优势,并且概率潜在语义分析(PLSA)的模型还具有学习记忆的能力。
     本文提出的图片过滤模型旨在提高过滤的性能。本文之所以使用PLSA模型与SIFT方法结合,主要考虑SIFT对图像的差分高斯运算相当于一个高通滤波器,它能够把背景中的低频分量滤除。换句话说,SIFT特征点一般不会出现在纯色背景以及变化缓慢的渐变背景中,使反应图片主要内容的特征更加突出,这样也就减少了待测样本图片特征与训练库中图片特征匹配失误,从而提高了过滤性能。在相同的试验条件下,我们分别对经典的PLSA算法与本文所用的算法进行了相同的试验,结果表明本文所使用的算法的性能优于经典的PLSA算法。
     本文有以下两个创新点:
     1.把图片训练库中的视觉单词根据人体敏感区域特征进行优化,减少训练库视觉单词的数量,从而提高了系统在进行匹配时的效率。
     2.该模型是把一般PLSA模型匹配算法与SIFT算法应用到图片过滤中,SIFT算法可以满足尺度不变性以及对遮盖物体部分情况具有很强的克服能力,从而减少背景对特征匹配的影响。这样就克服了单纯使用PLSA算法时,图片背景对过滤的影响。
With the rapid development of information and Internet technology in the 21st century, the Networked information has become as the convenient information source, and the big part of it is in the form of graphics. As the increasing of networked information, the amount of images presented in network will also increase. And the large amounts of them have included porn images, which terribly influences the networked life of peoples particularly the teenagers. In order to resolve the problem of having porn images on the Internet, researchers have proposed the filtering technology based on the image content. In the process of content-based image filtering, the feature extraction of images and a good filtering strategy became the key to success. According to these two points, a combination of pictures filtering model which bases on feature extraction algorithm, says SIFT, and a feature matching algorithm, says plsa are presented in this paper.
     This paper discuses the plsa matching algorithm as the primary algorithm, which is due to sensitive image filtering is a problem of small samples. The model of Probabilistic Latent Semantic Analysis (plsa) has great advantage for solving those problems with small samples, nonlinear and higher dimensional pattern recognition, and classification. In addition, this model has the ability of learning and memory.
     The proposed image filtering model aims to improve the performance of the filter. The technique presented here is to combine with the plsa model and SIFT method on images. This is based on the consideration of that SIFT is equivalent to a high-pass filter when the difference Gaussian operation is applied on images, which can filter low-frequency components from the image background. In other words, the SIFT features are not presented in the pure color and smooth background, so that the characteristics of the image can be highlighted and reflect the main contents of the image. Therefore, the mismatch between the features of sample images and actual images is reduced. The traditional plsa algorithm and the one proposed in this paper have been tested under the same condition, and the experimental results show that the proposed algorithm is better than the traditional one.
     This paper has shown the following contributions to detection of porn images:
     1. According to characteristics of human sensitive area, the visual words of images from a training library are optimized, which can reduce the number of visual words in the training library, and improve the system performance in terms of the efficiency of matching.
     2. This model is a combination of the plsa and SIFT algorithm applied to image filtering. The SIFT algorithm can satisfy to Scale-invariant feature transform and this algorithm has strong ability to solve the problem of cover objects, so that which can reduce the influence of background on feature matching. Therefore, the influence of image background on filtering, when only the PLSA algorithm is used to the image filtering, is removed.
引文
[1]杨义先.网络文化安全综述.2005.5/2007.10.
    [2]王燕,杨文阳,张屹.中国网络文化安全推荐信息评价指标体系研究[J].情报杂志.2008,(05):64-66
    [3]段立娟,崔国勤,高文,张洪明.多层次特定类型图像过滤方法[J].计算机辅助设计与图形学学报.2002,14(5):404-409
    [4]王宇石,付立波,高文.互联网敏感图像监控技术的研究[J].电信科学.2008,12:11-15
    [5]陈锻生,刘政凯.肤色检测技术综述[J],计算机学报,2006,29(2):194-207
    [6]B.Jedynak,H.Zheng,and M.Daoudi. Statistical models for skin detection[C].In IEEE Workshop on Statistical Analysis in Computer Vision. in conjunction with CVPR 2003. Madison, Wisconsin, and June 2003
    [7]刘益和,刘嘉勇,袁新峰.基于内容分析的特定图像过滤技术研究[J].计算机工程与应用.2006,24(4):56-58
    [8]Lazebnik S,Schmid C,Ponce J. Beyond bags of features:Spatial Pyramid matching for recognizing natural scene categories,IEEE Conference on Computer vision and Patten Recognition(CVPR).June2006:2169-2178
    [9]展翅鸟科技中文反黄软件大比拼.http://www.tueagles.com:81/baba/.htm
    [10]Fleck M, Forsyth D A, Bregler C. Finding naked people [A],In:Proceedings of the 4th European Conference on Computer Vision, Cambridge, U K.1996,2: 593-602
    [11]网络爸爸(www·tutelages·com/baba/)中文反黄软件大比拼.[EB/OL].http://www·tutelages·com/baba/softnews/1020.html
    [12]许强,江早,赵宏(XU Qiang,JIANG Zao,ZHAO Hong)基于图像内容过滤的智能防火墙系统研究与实现(Research and implementation of an Intelligent firewall system based on image content filtering) [J].计算机研究与发展(Journal of Computer Research & Development),2000,37 (4):458-464
    [13]李向阳等.基于内容的图像检索技术与系统[J].计算机研究与发展.2001,9(2): 46-52
    [14]Zhuang Yueting, et al. Web-based Multimedia Information Analysis and Retrieval[M],Beijing:Tsinghua University Publish.2002
    [15]Lui Zhongwe, et al Image Retrieval Using Both Color and Texture Features[J]. Journal of China Institute of Communications.1999.20(5):35-37
    [16]樊凌涛等.图像和视频的检索技术[J].计算机工程与应用.2001,(9):71-83
    [17]龚蛟腾.网络信息检索技术现状、瓶颈及趋势分析[J].情报杂志.2004,(5):75-77
    [18]黄祥林.图像检索中的关键技术[J].测控技术.2002,21(5):22-25
    [19]周长发.精通Visual C++图像编程[M].北京:电子工业出版社.
    [20]W.Zhao, R.Chcllappa, PJ.phillips, and A.Rosenfeld. Face recognition literature survey. ACM Computing Surveys 35(4):399-458,2003
    [21]Oliver N. Penal and A. Bernard.,"LAFTER:A real-time and Face Tracker with Facial Expression recognition',CVPR97,IEEE Computer Society,San Juan,Puerto Rico.June1997
    [22]李雁,申铱京,赵德斌.基于纹理的皮肤检测[J].计算机工程与应用.2003.
    [23]师一华.人体肤色检测的研究现状及发展方向[J].郑州航空工业管理学院学报.2005年第06期
    [24]S.A.Shafer.UsingeolortoseParaterefleetioncomPonents,COLORResearchand Application.10(4):21 218,1985
    [25]GJ.Klink, S.A.Shafer, and T.Kanade. Aphasia approach to color image under-standing.International Jamal of Computer Vision.4(1):7-38, Jan.1990
    [26]Feifei Li, Perona P.A Bayesian hierarchical model for learning natural scene Categories,IEEE Conference on Computer Vision and Patten Recognition (CVPR).2005,2:524-531
    [27]刘忠伟,章毓晋.基于颜色特征进行图像检索.电子技术应用.25(2):11[25]
    [28]张宏林等.Visual C++数字图像模式识别技术及工程实践.人民邮电出版社.2003.2
    [29]潘爱民.COM原理与应用,清华大学出版社.1999.11
    [30]Brent Rector Chris sells著.深入解析ATL.潘爱民,新语译.中国电力出版社.2001.10
    [31]马颂德,张正友,计算机视觉-计算理论与算法基础.科学出版社.1998
    [32]M.Stricker and M.Orengo. "Similarity of color images", SPIE Storage and Retrievalfor ImageandVideoDatabasesl 11.vol.2185,PP.381-392,Feb.1995
    [33]John R.Smith and Shih Fu Chang. Tools and techniques for color image retrieval, In Proc. Of SPIE.' Storage and Retrieval for image and Video Database.vol2670,1995
    [34]J.Huang,et al. "Image indexing using color correlogram."IEEE Int,Conf on Computer Vision and Pattern Recognition.PP.762-768,Puerto Rico,June1997
    [35]何清法,李国杰.综合分块主色和相关反馈技术的图像检索方法.中国图形图像学报
    [36]许强,江早,赵宏.基于图像内容过滤的智能防火墙系统研究与实现阴.计算机研究与发展.2000,37(4):45-64
    [37]段丽娟,崔国勤,高文,张洪明.多层次特定类型图像过滤方法[J].计算机辅助设计与图形学报.2002,14(5):404-409
    [38]尹显东,唐丹,邓君,李在铭.基于内容的特定图像过滤方法[J].计算机测量与控制.2004,12(3):283-286,
    [39]徐欣欣,袁华,张凌.利用颜色和纹理特征的图像过滤方法[J].华南理工大学学报(自然科学版).2004年第51期
    [40]HEIKKIL' J, SILV*N O. A four2step camera calibration procedure with imp licit image correction [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPRp97),San Juan, Puerto Rico:[s n].1997,1: 1106-1112
    [41]谭晓军,余志,李军.一种改进的立体摄像机标定方法[J].测绘学报.2006,35(2):138-142
    [42]Camera calibration toolbox for Mat lab [EB/OL]. http://www. Vision. Caltech. edu/bouguet/calib_doc/.2007-02-08
    [43]Anna Bosch,Andrew Zisserman. Scene Classification Using a Hybrid Generative/Discriminative Approach, IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI).2008,30(4):712-727

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700