基于内容的敏感网页过滤器的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
互联网的快速发展使人们能够轻松地实现海量信息资源的传输与共享,给人们的生产、生活和信息交流带来了极大的便利,对全球经济、文化的交流起到了巨大的推动作用。然而,这也给不法分子发布和传播色情、暴力、反动等敏感信息提供了机会。Internet上的信息量以指数形式飞速增长,信息类型也变得更加丰富,由单一的文本方式逐步变为以图像、视频等多媒体信息为主的表现形式。色情、暴力等敏感视频因其具有强大的视觉冲击力而成为了不法分子广为传播的对象,借助互联网这种跨地域、跨国界、开放式的通讯方式,它的不良影响将遍布世界各个角落,给社会稳定、人们的日常生活带来了严重的毒害作用。因此,敏感网页过滤器的设计与开发对营造我国绿色互联网环境、维护安定的社会环境、保护网民特别是青少年身心健康具有非常重要的意义。
     基于此,本文利用BHO技术设计并实现了敏感网页过滤器,该过滤器由网址过滤器、网页文本过滤器和网页敏感图像过滤器三个子过滤器组成。首先对网址进行过滤,BHO技术可以实现从IE浏览器的地址栏中获取访问网页的URL。将该URL与敏感URL数据库中的信息进行比较,如果该URL是敏感网址,则返回空白网页,否则进行网页文本和网页图像的检测。其次过滤网页文本,如果获取的URL是非敏感网址,则浏览器下载网页资源并进行网页文本和图像过滤,通过DocunmentComplete事件可以获知网页内容是否下载完毕,下载完毕后,通过DHTML文档模型来获取文本内容,并采用最大跳跃(SMA)算法将网页文本与敏感词汇数据库进行匹配。最后进行网页敏感图像过滤,采用人脸检测、肤色检测、皮肤纹理检测和分类器识别结合的敏感图像检测算法进行检测。人脸检测的目的是确定图像中含人物,利用Sobel算子和统计直方图模型进行基于纹理的肤色检测,以确定图像中的肤色区域,利用Gabor滤波法对图像中的肤色区域进行皮肤纹理检测,采用分类器对敏感图像和非敏感图像进行识别。
     实验测试结果表明,本文设计的敏感网页过滤器能够有效拦截并过滤敏感网页,基本上实现了对敏感站点访问的控制。
With the rapid development of Internet, people can easily transfer and share vastamounts of information resources. It brings great convenience to people's production, life, andinformation exchange, promoting global economic and cultural exchanges. However, Internetalso provides a chance for lawbreaker to release and spread information such as pornography,violence, reactionary. The amount of information on the Internet is rapid growth at anexponential form, and the type of information has become richer from containing just a singletext gradually into containing images, video and other multimedia information. Pornography,violence and other sensitive video because of its powerful visual impact has been used widelyby lawbreaker, due to Internet’s cross-regional, cross-border, and open communication, itsconsequences is around the corners of the world, and endangers social stability and people'sdaily life. Therefore, the design and development of sensitive Web filter to create a greenChina's Internet environment, maintain a stable social environment and protect Internet users’especially young people’ s physical and mental health has very important significance.
     Based on this, sensitive web filter was designed and implemented in this study using theBHO technique. The filter is composed of the URL filter, page text filter, and web-sensitiveimage filter. First, URL filtering, the BHO technology can get access to the URL of the pagefrom the IE browser's address bar. Compare the URL with sensitive URL databaseinformation, if it is sensitive URL, return a blank page; if it is not, detect web page text andpage image. Second, filter page text, if it is not sensitive URL, the browser device downloadsweb resources and filters web page text and image, it can be informed when the downloadedcompletes by DocunmentComplete event. Once finished downloading, achieve text contentby using DHTML document model, and match the web page text and sensitive wordsdatabase with the largest jump (SMA) algorithm. Last, filtering of sensitive image inWebPages, using the algorithm combination of face detection, color detection, skin texturedetection and classification device identification to detect. The purpose of face detection is todetermine that if the image contains characters. We use Sobel operator and statistics straightside graph model to implement skin color detection based on texture, to determine the regionof the skin color. We use Gabor filtering to detect texture of the skin in the region of the skin color. Identify the sensitive images and non-sensitive image with classifier.
     The test results show that the sensitive web filter designed in this paper can effectivelyintercept and filter sensitive pages, basically control the access to sensitive sites.
引文
[1]孙竞媛.基于内容的敏感图像过滤技术的研究[D].吉林大学,2007.
    [2]第29次中国互联网络发展状况调查统计报告[EB/OL]http://www.cnnic.net/dtygg/dtgg/201201/t20120116_23667.html
    [3]顾井南.网络不良图片过滤技术研究[D].北方工业大学,2011.
    [4] M. M. Fleck, D. A. Forsyth and C. Bregler. Finding Naked People[J]. Proceedings ofEuropean Conference on Computer Vision,1996,2:593-602.
    [5] D. A. Forsyth and M. M. Fleck. Identifying Nude Pictures[J]. IEEE Workshop onApplications of Computer Vision,1996,103-108.
    [6] James Ze Wang. System for Screening Objectionable Images Using Daubechies' Waveletsand Color Histograms [J]. In Proc IDMJ,1997,20-30.
    [7] James Ze Wang et al. System for Screening Objectionable Images [J]. ComputerCommunications Journal, Elsevier, Amsterdam,1998,21:1355-1360.
    [8] Michael J. Jones and James M. Rehg. Statistical Color Model with Application to SkinDetection[R]. Cambridge Research laboratory Technical Report Series, CRL98/11,December1998.
    [9] Drimbarean A. F., Corcoran P. M., Cuic M., et al. Image processing techniques to detectand filter objectionable images based on skin tone and shape recognition[C]. InProceedings of International Conference on Consumer Electronics, Boston, USENIXPress,2001,278-279.
    [10] Pang B, Lee L, Vaithyanathan S. Sentiment classification using machine learningtechniques [C]. Proceedings of the Conference on Empirical Methods in NaturalLanguage Processing,2002,79-86.
    [11] Lee S S, Chung S, et al. A harmful contents classification using the harmful wordfiltering and SVM [C]. ICCS2007, Part III, LNCS4489:18-27.
    [12] Nanas N, Roeck A D, Vavalis M. What happened to content-based information filtering?[J]. ICTIR2009, LNCS5766:249-256.
    [13] Malo P, Siitari P, Ahlgren O, et al. Sentiment content filtering with Wikipedia andontologies [C]. Proceedings of2010IEEE International Conference on Data MiningWorkshops,2010:518-526.
    [14]孙艳.基于内容图像检索与敏感图像过滤的若干算法研究[D].吉林大学,2011.
    [15]许强,江早,赵宏.基于图像内容过滤的智能防火墙系统研究与实现[J].计算机研究与发展,2000,37(4):458-464.
    [16]段立娟,崔国勤,高文,等.多层次特定类型图像过滤方法[J].计算机辅助设计与图形学学报,2002,14(5):404-409.
    [17]孙庆杰,吴恩华.基于矩形拟合的人体检测[J].软件学报,2003,14(8):138-139.
    [18]杨金锋,等.一种新型的基于内容的图像识别与过滤方法[J].通信学报,2004,25(7):93-106.
    [19]赵晓晖.基于内容的敏感图片过滤技术的研究及其在IE浏览器中的实现[D].吉林大学,2005.
    [20]彭强,张晓飞.基于特征向量的敏感图像识别技术[J].西南交通大学学报,2007,42(l):13-18.
    [21]魏巍.基于人体关键部位检测的网上敏感图片过滤技术研究[D].吉林大学,2008.
    [22]桑庆兵,吴小俊.基于BHO的网站过滤系统研究与实现[J].计算机工程与应用,2009,45(31):18-21.
    [23]王一丁.实际网络环境中不良图片的过滤方法[J].通信学报,2009,30(10):103-106.
    [24]吕林涛,赵呈轩,尚进,杨宇祥.基于高层语义视觉词袋的色情图像过滤模型[J].计算机应用,2011,31(7):1847-1849.
    [25]孙艳,周学广.基于粗糙集与贝叶斯决策的不良网页过滤研究[J].中文信息学报,2012,26(1):67-72.
    [26] Yi-LehWu, Edward Y. Chang, Kwang-TingCheng, et al. MORF:A DistributedMultimodal Information Filtering System[M]. Lecture Notes in Computer Science,2002:97-104.
    [27] http://www.ltutech.com/en/
    [28] Image Beagle, http://www.imagebeagle.com
    [29]曾炜,郑清芳,赵德斌.图片卫士:一个自动成人图像识别系统[J].高技术通讯,2005,15(3):11-16.
    [30]飞涛软件工作室,护花使者. http://www.18ie.com/.
    [31]美萍电脑软件工作室,美萍反黄专家. http://www.mpsoft.net/shield.Htm.
    [32]北京展翅鸟科技公司,网络爸爸. www.tueagles.com/baba.
    [33]图像方舟, http://www.aitcn.com/
    [34]火眼金睛, http://www.iflytek.com/
    [35] IE的探索之浏览器概览. http://www0.ccidnet.com/tech/web/2001/02/08/58_1635.html
    [36][美]Scott Roberts. Microsoft Internet Explorer5程序设计[M].北京:清华大学出版社,2001年2月第1版.
    [37]求是科技. Visual C++6.0数据库开发技术与工程实践[M].北京:人民邮电出版社,2004年1月第1版
    [38]陈家伟.基于内容的图像过滤[D].华南理工大学,2010.
    [39]姜志伟.基于内容的WEB图像过滤技术研究[D].浙江大学,2007.
    [40]赵明华.人脸检测和识别技术的研究[D].四川大学,2006.
    [41]闫敬敏.基于压缩域的敏感图片检测[D].吉林大学,2009.
    [42] Habili N., Cheng-Chew Lim, Moini A. Hand and Face Segmentation Using Motion andColor Cues in Digital Image Sequences[C]. In Proceedings of the IEEE InternationalConference on Multimedia&Expo,2001,377-380.
    [43]杨庆祥.敏感图像过滤系统的算法研究[D].天津大学,2008.
    [44] JONES M. J., REHG J. M. Statistical Color Models with Application to SkinDetection[C], In: International Journal of Computer Vision,2002,46(1):81-96.
    [45] R. Kjeldsen, J. Kende. Finding Skin in Color Images[J] Face and Gesture (FG96),1996,312-317.
    [46] Chai D., Bouzerdoum A. A Bayesian Approach to Skin Color Classification in YCbCrColor Space[C]. In: TENCON2000Proceedings,2000,2:421-424.
    [47]吕东辉,王滨. YCbCr空间中一种基于贝叶斯判决的肤色检测方法[J].中国图象图形学报,2006,11(1):47-52.
    [48] Lijuan Duan, Guoqin Cui, Wen Gao et al. Adult Image Detection Method Base-On SkinColor Model and Support Vector Machine[C] In: ACCV2002, Melbourne, Australia,2002,22-25.
    [49] Ming-Hsuan Yang, Narendra Ahuja. Gaussian Mixture Model for Human Skin Color andIts Application in Image and Video Databases[C]. In: Proceedings of SPIE99,1999,458-466.
    [50] J. Yang, W. Lu, A. Waibel. Skin-Color Modeling and Adaptation[C]. In: Proceedings ofACCV,1998,687-694.
    [51] T. S. Jebara, A. Pentland. Parameterized Structure from Motion for3d AdaptiveFeedbackTracking of Faces[C]. In: Proceedings of Computer Vision and Pattern Recognition,1997,144-150.
    [52] R. Azencott, J. P. Wang and L. Younes. Texture Classification Using Windowed FourierFilters[J]. In: IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(2):148-153.
    [53] D. A. Clausi and M. E. Jernigan. Designing Gabor Filters for Optimal TextureSeparability[J]. In: Pattern Recognition,2000,33(11):1835-1840.
    [54]四维科技. Visual C++/MATLAB图像处理与识别实用案例精选[M].北京:人民邮电出版社,2004.
    [55]宋华,戴一奇.一种用于内容过滤和检测的快速多关键词识别算法[J].计算机研究与发展,2004,41(6):940-945.
    [56]杨凯峰,牟莉,许亮.基于离散小波变换和RBF神经网络的说话人识别[J].西安理工大学学报,2011,27(3):368-372.
    [57]荆仁杰,等.计算机图像处理[M].杭州:浙江大学出版社,1998.
    [58]韩敏,崔丕锁.一种用于模式识别的动态RBF神经网络算法[J].大连理工大学学报,2006,46(5):746-751.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700