A mid-level video representation based on binary descriptors: A case study for pornography detection

详细信息查看全文

作者：Carlos Caetano^a ; ^b ; ^{carlos.caetano@dcc.ufmg.br" class="auth_mail" title="E-mail the corresponding author}Author Vitae ; Sandra Avila^c ; ^{sandra@dca.fee.unicamp.br" class="auth_mail" title="E-mail the corresponding author}Author Vitae ; William Robson Schwartz^b ; ^{william@dcc.ufmg.br" class="auth_mail" title="E-mail the corresponding author}Author Vitae ; Silvio Jamil F. Guimarã ; es^d ; ^{sjamil@pucminas.br" class="auth_mail" title="E-mail the corresponding author}Author Vitae ; Arnaldo de A. Araú ; jo^a ; ^{arnaldo@dcc.ufmg.br" class="auth_mail" title="E-mail the corresponding author}Author Vitae
关键词：Binary descriptors ; Mid-level representation ; Bag-of-Words ; BossaNova ; Pornography
刊名：Neurocomputing
出版年：2016
出版时间：12 November 2016
年：2016
卷：213
期：Complete
页码：102-114
全文大小：2341 K

文摘

With the growing amount of inappropriate content on the Internet, such as pornography, arises the need to detect and filter such material. The reason for this is given by the fact that such content is often prohibited in certain environments (e.g., schools and workplaces) or for certain publics (e.g., children). In recent years, many works have been mainly focused on detecting pornographic images and videos based on visual content, particularly on the detection of skin color. Although these approaches provide good results, they generally have the disadvantage of a high false positive rate since not all images with large areas of skin exposure are necessarily pornographic images, such as people wearing swimsuits or images related to sports. Local feature based approaches with Bag-of-Words models (BoW) have been successfully applied to visual recognition tasks in the context of pornography detection. Even though existing methods provide promising results, they use local feature descriptors that require a high computational processing time yielding high-dimensional vectors. In this work, we propose an approach for pornography detection based on local binary feature extraction and BossaNova image representation, a BoW model extension that preserves more richly the visual information. Moreover, we propose two approaches for video description based on the combination of mid-level representations namely BossaNova Video Descriptor (BNVD) and BoW Video Descriptor (BoW-VD). The proposed techniques are promising, achieving an accuracy of 92.40%, thus reducing the classification error by 16% over the current state-of-the-art local features approach on the Pornography dataset.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700