基于内容敏感图像过滤关键技术研究及应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
目前,针对如何防止网络黄毒的侵害这一研究课题常采用网址封锁和敏感关键词匹配技术,存在明显的滞后性和局限性,必须结合图像过滤技术才能更有效的防止黄毒的传播。本文以此为背景,依托于2004年度珠海市科技项目(PC20041101)——“基于内容的敏感图片过滤技术的研究及其在IE浏览器中的实现”,对基于内容的敏感图像过滤中的若干关键技术进行了研究,并在此基础上构造实现了一个有效的敏感图像过滤器,最终将网址封锁、关键词匹配和敏感图像过滤技术结合起来应用于IE浏览器,实现敏感信息的在线检测功能。
    本文首先构造了一个较完全的实验图库,包括肤色标注掩码库和测试库,在此基础上进行了后续研究。肤色检测模型是本体系结构的核心,本文在标注掩码库上比较分析了三种肤色检测模型,并训练出较好者用于标注图像中的肤色区域。经过初步肤色检测后的掩码图像还需要进行一些必要的辅助处理,包括纹理确认、滤波和去噪等。为了降低类大头贴式肖像类图像的误检率,系统中又引入了人脸检测机制,使用基于AdaBoost的快速人脸检测算法。为了能对两类图像进行有效的分类,本文参考敏感图像自身的特点,结合掩码图像和原图像提取并试验了有效的分类特征,用于作为分类器的输入特征向量。最后通过训练C4.5算法构造出有效的决策树,并将获得的分类规则用于分类特征向量。结合实际应用,本文又将过滤器作为一个插件用于IE浏览器中,实现敏感网页的实时检测和过滤功能。
The flooding of network eroticism not only badly affects the body and mind healthof teenagers, moreover also brings many inconveniences to the people's normalusing of Internet. How to keep eroticism off Internet is an important research topic,which has attracted many researchers from domestic and foreign to engage in thisresearch. In domestic, the research of Chinese counter-eroticism software startsfrom the end of last century, up to the present, there are about twenty kinds ofcounter-eroticism software developed for filtering eroticism, and the realization ofsuch softwares depends on the technologies of blocking Web addresses andmatching erotic keywords. With the rapid inflation of the Internet informations,there are obvious lags and shortcoming using the foregoing technologies, so wemust combine the image filtering technology to block the eroticism disseminationeffectively. Founded on “Research on Content-Based Erotic Image Filteringtechnique and its Application in IE” of Zhuhai Science and Technology PlanningProjects in 2004, we study the key technologies of Content-Based erotic imagefiltering, construct and realize an effective erotic image filter, in the end ,wecombine the image filter with the Web addresses blocking and keywords matchingto form a layered page erotic information filter,and we also embed the filter in IEbrowser as a plug-in, filtering the erotic information on-line.
    This paper discusses several key technologies of Content-Based erotic imagefiltering, after studying the research results that have presented, we design andrealize an effective erotic image filter. The main work of the dissertation is asfollows:
    (1) We construct a more complete image database, containing a markedskin-mask bank of 1442 images and a test image bank of 15890 images, and signthe images using the classification strategy. All the work we have done in thispaper is based on the image bank.
    (2) This dissertation analyses skin-color detecting models in common use atthe present time, on the whole, there are two types of study directions forskin-color detecting: one is based on single pixel, the other is based on both singlepixel and neighboring information between pixels. We mainly compares threealgorithms of detecting skin-color pixels--the Chroma Space Algorithm, the ByesClassifier Algorithm based on skin-color statistical histogram and the Seed
    Diffusion Algorithm based on neighboring information. Considering combinationof the precision and speed, we select the Byes Classifier Algorithm based onskin-color statistical histogram finally. We evaluate the optimal threshold throughestimating the Equal Error Rate and we choose the threshold θ =0.1175 in ourtraining set. Using this decesion threshold, the correctness of skin-color detectingcan achieve to 91.52% on the test set which contains 481 images, the error rate is8.62%, and the detecting time on an image with 574*691 pixels is 0.0738 second.(3) In order to reduce the error rate of classification for portrait imageeffectively, the human face detection mechanism is introduced in the filter.Considering the combination of precision and speed, we use the face detectionmechanism proposed by P.Viola, which combining AdaBoost and Cascadetechnology. In our system, we choose the appropriate parameters throughexperimenting, using the training parameters, the correctness of face detecting is85.56% on the test set which contains 817 images, and the detecting time on animage with 740*784 pixels is 0.6353 second. The results show that the precision ofour system can be improved largely (about 10% on our test set) after adding theface detection mechanism into our erotic image classifier.(4) The feature vector extraction and evaluation for classifying erotic imageand the construction of the Decision Tree classifier. We extract ten features that arepropitious to classifying in all from mask image and the relevant origin imagebefore classifying, and evaluate these features considering their capability ofclassification respectively, then select five features as our character set. Weconstruct an effective Decision Tree Classifier through training 3624 givenexamples using C4.5 algorithm, then use the final rules to classify the featurevector.Experiments and analysis show that our erotic image classifier can identify thebenign image and erotic image effectively, its precision is about 91.13%(while theprecision for erotic image recognition is 76.13%, the precision for benign image is92.68%) on our test set with 5053 images. With a view to the application of ourproject, we add the final classifier into the IE browser as a plug-in using the BHO(Browser Helper Object) technology to filter the erotic information on-line.There are many places of our filtering system need to be improved andperfected, such as more efficient skin-color pixel detecting model, the detection ofspecial parts of human body and the optimization of the system real-time capability
    and so on, these are also our future work.
引文
[1] 第 17 次中国互联网络发展状况统计报告, http://www.cnnic.cn/index/0E/00/11/.
    [2] B.Starynkevitch, M.Daoudi et al., POESIA Software Architecture Definition Document. http://www.poesia-filter.org/pdf/Deliverable_3_1.pdf, Deliverable 3.1:7_9, December, 2002.
    [3] http://www.frontfree.net/view/news_1496.html.
    [4] 网络爸爸: http://baba.tueagles.com.
    [5] 科利华学生浏览器:http://www.cleverie.com.cn.
    [6] 美萍反黄专家:http://www.mpsoft.net/shield.htm.
    [7] 火眼金睛:http://www.iflytek.com/.
    [8] 五行卫士:没有网站, 光盘发售
    [9] 护花使者:http://www.18ie.com/.
    [10] Content Tools and Strategies for Protecting Kids from Pornography and Their Applicability to Other Inappropriate Internet Content http://www7. national academies.org/itas/.
    [11] Christel, M. Addressing the Contents of Video in a Digital Library. In the electronic proceedings of the ACM Workshop on Effective Abstractions in Multimedia: Layout, Presentation, and Interaction. Held in conjunction with ACM MM '95 on Nov. 4, 1995.
    [12] J.Z.Wang,J.Li,G.Wiederhold, et al. System for screening objectionable images[J]. Computer Communications, 1998, 21(15): 1355-1360.
    [13] Forsyth DA,Fleck MM. Automatic detection of human nudes. International Journal of Computer Vision,1999,32(1):63~77].
    [14] Michael J. Jones and James M. Rehg. Statistical Color Model with Application to Skin Detection. In Proc. of the CVPR '99, vol.1,274-280.
    [15] H.Zheng, M.Daoudi and B.Jedynak, Blocking Adult Images Based on Statistical Skin Detection, Electronic Letters on Computer Vision and Image Analysis, Volume 4, Number 2, pages 1-14, 2004.
    [16] 段立娟,崔国勤,高文,等.多层次特定类型图像过滤方法[J].计算机辅助设计与图形学学报,2002,14(5):404-409.
    [17] J. Ruiz-del-Solar et al. Skin Detection using Neighborhood Information. 6th Int. Conf. on Face and Gesture Recognition – FGR 2004,pp. 463~468, Seoul, Korea, May 2004
    [18] P. Viola. Rapid object detection using a Boosted cascade of simple features. In: Proc IEEE Conference on Computer Vision and Pattern Recognition, pp: 511~518, 2001.
    [19] 胡冠宇. 基于肤色之裸体影像侦测之研究. 台湾国立成功大学, 硕士论文. 2004.
    [20] V. Vezhnevets, V. Sazonov, and A. Andreeva, A Survey on Pixel-Based Skin Color Detection Techniques, Proc. Graphicon-2003, pp. 85~92, Moscow, Russia, Sep. 2003.
    [21] 赵晓晖. 基于内容的敏感图片过滤技术的研究及其在 IE 浏览器中的实现. 吉林大学, 硕士论文. 2004.
    [22] 吴相豪,申铉京. 基于像素的三种肤色检测模型的比较与研究[J].计算机应用研究, 2003.9 精扩本: 430~432.
    [23] J.Fritsch, S.Lang, M.Kleinehagenbrock, G.A.Fink and G.Sagerer. Improving Adaptive Skin Color Segmentation by Incorporating Results from Face Detection. Proceedings of the IEEE Int. Workshop on Robot and Human Interative Communication, Germany, 2002:337~343.
    [24] Albioly A, Torresz L, Delpx E J. Optimum color spaces for skin detection[C]. IEEE Conference Image Processing, Tessaloniky, Greece: IEEE Computer Society Press, 2001.
    [25] Robert M.Haralick, K.Shanmugam, and Its'hak Dinstein. Texture features for image classification. IEEE Trans. On Sys, Man, and Cyb,SMC-3(6):610-621,1973.
    [26] Hideyuki Tamura and Shunji Mori and Takashi Yamawaki, textural features corresponding to visual perception, IEEE Trans. Sys. Man, Cybern, vol. SMC-8, pp.460-473, No.6, June 1978.
    [27] Ma W Y, Manjunath B S. Texture features and learning similarity. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 1996, 425~430.
    [28] Chang T. Kuo J.Texture analysis and classification with tree-structured wavelete transform. IEEE Transaction on Image Processing, 1993, 2(2):429~441.
    [29] R. M. Haralick. Statistical and Structural Approaches to Texture. Proc. of IEEE. 1979, vol. 67, No. 5, pp. 45-69.
    [30] J.P.Marques de Sa'著, 吴逸飞译. 《模式识别―原理、方法及应用》, 清华大学出版社, 北京, 2002.
    [31] 梁路宏 , 艾海舟 , 徐光祐 , 张钹 . 人脸检测研究综述 . 计算机学报 , 2002, 25(5):449-458.
    [32] Miao J , Yin B C, Wang KQ etal. A hierachical multiscale and multiangle system for human face detection in a complex back-ground using gravity-center template. Pattern Recognition, 1999, 32(10):1237-1248.
    [33] 张洪明, 赵德斌, 高文. 基于肤色模型、神经网络和人脸结构模型的平面旋转人脸检测. 计算机学报, 2002 Vol. 25 No. 11, p1250-1256.
    [34] 艾海舟, 梁路宏, 徐光佑. 基于肤色和模板的人脸检测[J]. 软件学报, 2001, 12(12): 1784-1792.
    [35] Viola P. Rapid object detection using a Boosted cascade of simple features. In: Proc IEEE Conference on Computer Vision and Pattern Recognition, pp:511~518, 2001.
    [36] 李月敏, 陈杰, 高文等. 快速人脸检测技术综述. 第 16 届计算机科学与技术应用学术会议征文, 2004.8.
    [37] Tom M. Mitchell 著, 曾华军等译. 《机器学习》, 机械工业出版社, 2003 年第一版.
    [38] Quinlan J.R., Discovering rules from large collections of examples: A case study, In:Michie, D. editor, Expert Systems in the Microelectronic Age, Edinburgh University Press, Scotland, (1979).
    [39] Quinlan J.R,C4.5: Programs for Machine Learning,San Mateo,CA: Morgan Kaufmann, (1993).
    [40] Raghavan,V.,Bollmann,P.,and Jung,G,iA critical investigation of recall and precision as measures of retrieval system performance, i ACM Transactions on Information Systems, 7(1989), 205-229.
    [41] [美]Scott Roberts. Microsoft Internet Explorer 5 程序设计[M]. 北京: 清华大学出版社. 2001 年 2 月第 1 版: 245-290, 429-442.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700