图像型垃圾邮件过滤关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
电子邮件在方便人们便捷通信的同时,也逐渐成为了别有用心的人用作发送广告、传播淫秽色情内容、进行恶意诈骗和宣传反动思想及言论的便捷途径。目前,针对文本型垃圾邮件的过滤已取得较好效果。但自2006年起,为了躲避传统的过滤系统,垃圾邮件发送者开始将邮件文本内容移至图像中进行发送,并且经常以加入形变文字和各种噪声干扰等方式进一步对抗过滤系统,这些手段大大降低了过滤器的性能。相对于传统垃圾邮件而言,图像型垃圾邮件具有更强的隐蔽性,消耗了更多的网络带宽、计算和存储资源,同时给社会带来了更大的安全隐患,对其进行有效地过滤已到了非常迫切的时刻。为了防止图像型垃圾邮件的进一步泛滥,本文针对垃圾邮件图像的不同特征以及实际应用需求,对其中的若干关键问题进行了研究。
     通过对垃圾邮件的生成与发送方式分析可知,垃圾邮件图像具有批量发送的特征,相同来源的垃圾邮件图像主要利用相同的模板生成,彼此之间通常具有相似的结构或者区域。针对上述特征,本文分析了近似复制图像检测中存在的主要问题,提出了一种结合局部特征点的邻域几何上下文和匹配点之间的全局几何一致性验证来提高近似复制图像匹配准确性的方法。首先,提取对应于每个SIFT局部特征点的弱稳定特征点,用于生成几何上下文信息,以避免特征点量化为视觉单词后导致的可区分性降低问题;然后,判断两幅图像匹配的点对中是否存在满足全局几何关系一致性的子集,以进一步验证潜在匹配图像的正确性。实验结果表明,本方法能够有效地提高部分近似复制图像识别的准确率,这对于有样本时的垃圾邮件图像过滤具有积极意义。
     垃圾邮件图像的另一个重要特征是其中经常包含大量的文本,因此可以借鉴基于内容检测的传统垃圾邮件过滤方法,同样判断邮件图像中是否包含特定的敏感关键字。本文提出了一种利用字符基元视觉短语进行图像关键字识别的方法。首先,通过提取图像中的最大稳定极值区域用于构造字符基元;然后,根据MSER区域拟合椭圆的邻接特性构造字符基元视觉短语,同一图像关键字中的基元通常位于相同的视觉短语中;最后,结合元素相似性和几何邻接关系进行视觉短语相似性判断。这种方法不需要对图像进行二值化、布局分析和文本区域定位等预处理操作,具有较高的灵活性和鲁棒性。
     此外,本文还借鉴几何模糊描述符,提出了一种对于复杂干扰场景下的中文图像关键字的识别方法。借助可变核对图像进行高斯模糊,可以有效降低噪声干扰带来的影响。首先,利用几何模糊进行特征点匹配,并通过对匹配特征点的布局特征分析以滤除潜在的误匹配;然后,由于中文关键字中经常存在形状相近的文字,这些文字通常具有相同的偏旁,本文通过分析样本图像中未匹配点的区域范围大小以进一步提高匹配的准确性。实验结果表明,本文方法对于复杂场景中的关键字发现具有较好的效果,并且能够有效地区分形状相似的文字,对于垃圾邮件图像中常用的干扰类型具有较好的抗干扰性。
     垃圾邮件图像多种多样,不同类型的邮件图像间通常具有较大的特征差异。此外,还需要考虑到实际应用中对于垃圾邮件的漏判具有一定程度的容忍性,而对于正常邮件的误判通常会给用户带来较大的损失。因此,本文提出利用局部和全局特征进行图像特征描述,并借助级联分类器对不同类型的垃圾邮件图像进行分层过滤的方法。同时,为了避免误判造成的影响,利用信息熵对分类结果进行评估,对于分类结果不确定的图像进行多次判断或者直接作为正常图像,以达到尽可能降低垃圾邮件图像的漏报率,同时减少对于正常邮件图像误报的目标。
     为了对抗过滤器,垃圾邮件图像中经常被加入大量的干扰噪声,因此也可以将其作为垃圾邮件图像判断的重要依据。针对上述特征,本文提出一种对邮件图像背景区域中的噪声进行分析的方法。首先,利用小波变换得到邮件图像非文本区域的噪声特征图像;然后,通过对特征图像中的连通域分析进行噪声的度量和分类。该方法可以作为邮件图像的特征提取模块,其输出用于表示邮件图像中包含的“噪声量”以及“噪声的类型”。虽然图像中的噪声含量不能直接用于判断当前图像是否为垃圾邮件图像,却可以为后续判断提供重要依据。
Email has become an indispensable communication tool in our daily life.However, it has also become a convenient way for some people with ulteriormotives to send advertising, pornographic materials, malicious frauds, reactionaryideology and rhetoric in recent years. Nowadays, text-based filters have grown insophistication and effectiveness for filtering spam emails. Since2006, in response,spammers have adopted a number of countermeasures to circumvent thesetext-based filters. Currently, one of the most popular spam construction techniquesinvolves embedding text messages into images. It is also with deformable characters,different kinds of noise to defeat the filters furthermore, which poses a newchallenge for spam researchers. Image spam emails are more hidden, and consumemore network bandwidths, computing and storage resources, at the same time bringgreater security risk to the community. It has been the urgent moment for itseffective filtering. In order to prevent the further proliferation of image spam, wemake some researches on the key issues according to the different characteristics ofspam images, as well as the actual application requirements.
     Through analysis of the generation and sending ways of image spam, we knowthat spam images are always sent in batch. And the spam images from the samesource are often generated by the same template, and therefore commonly have thesimilar strucutre and regions. According to this feature, this paper analyzes the mainproblems in near-duplicate image detection(NDII), and proposes a novel schemecombing the neighborhood information of single local feature and the globalgeometric consistency of multi-local features for improving the accuracy ofnear-duplicate image detection. Firstly, we construct the geometric contextualinformation of image local features to enhance the distinctiveness of visual word.Then, we propose to verify the global geometric consistency of subset-of-featuresfor improving the accuracy of retrieval results furthermore. Experimental resultsshow that the proposed method can improve the accuracy of NDII prominently,which has a positive meaning for image spam filtering with sample images.
     One of the most important features of spam images is that it often containslarge amounts of text. Therefore, by the same way for filtering text-based spam, wecan also judge that whether the email image contains certain sensitive keywords.This paper proposes a new approach for image keyword spotting using visual phraseof character primitives. Firstly, maximally stable extremal regions are extractedfrom a given image, and then normalized to be our character primitives. Theprimitives of the same keyword are often within the same phrase. Then, we propose to measure the similarity with element similarity and geometric structureconsistency. This method does not require the processes of image binarization,layout analysis and text area localization. And it is more flexibly and robust.
     Otherwise, this paper proposes a method based on geometric blur descriptorsfor image keywords spotting in cluttered scenes. It can reduce the impact of noiseinterference with Gaussian variable kernels for image blurring. Firstly, we get theinitial correspondences of local feature points with geometric blur, and filter out themismatches by layout analyis. Because there often exist Chinese characters sharingthe same radicals, we propose to use the ratio of the area of the no-match featurepoints in the sample image to that of the whole image to further improve thematching accuracy. The experimental results show that our method can recognizeand spot the keyword images with high accuracy. And it has better anti-interferencefunctions for the noise used in spam images.
     Spam images are various. Different kinds of spam images are often withdifferent types of features. Furthermore, false positive will bring greater losses foremail users, and it is also tolerant to false negative to some extent in practice.Therefore, this paper proposes to use both local and global features for spam imagesdescription, and proposes to use cascade of classifiers for hierarchical filtering ofdifferent types of spam images. To avoid the false positives, we propose to useclassification entropy to indicate the multi-times of judgement or normal images.The experimental results show that we can not only reduce the false positive ratio offilters as much as positible, but also enhance the accuracy ratio.
     Spam images are commonly with many background noise components fordefeating spam filters. Therefore, the presence of background noise can beconsidered as an indication that an email is spam. According to this feature, thispaper proposes to obtain the noise feature image using wavelet transform, and thenthe method for noise measurement and classification by connected componentanalysis in the noise feature images is given. This technique is intended to be usedas a specific module of spam filter, whose output could indicate the “amount” and“type” of noise in email images. Since noise could also be present in legitimateimages, the results of noise analysis can not give the certainty that an email is spam.But it can be taken as an indication of the tricks which were introduced to defeatagainst OCR tools.
引文
[1] Zhang L, Zhu J, Yao T. An Evaluation of Statistical Spam FilteringTechinques[J]. ACM Transactions on Asian Language Information Processing,2004,3(4):243-269.
    [2]王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10.
    [3]何培舟,温向明,郑伟.图像垃圾邮件的发展和防范[J].电信科学,2008,24(7):68-72.
    [4]反垃圾邮件技术之巧用特征码制服垃圾邮件[R/OL]. http://security.zdnet.com.cn/security_zone/2009/0222/1347922.shtml.
    [5]赛门铁克公司2009年5月垃圾邮件报告[R/OL]. http://www.symantec.com/zh/cn/about/news/release/article.jsp?prid=20090601_01.
    [6]互联网协会公布的反垃圾邮件调查结果[R/OL]. http://www.softnext.com.cn/newsshow.asp?id=22.
    [7]中国互联网协会.中国互联网协会反垃圾邮件规范[J].信息安全与通信保密,2004,3:21-22.
    [8] Von A L, Blum M, Langford J. Telling Humans and Computers ApartAutomatically[J]. Communications of the ACM,2004,47(2):56-60.
    [9] Mehta B, Nangia S, Gupta M, et al. Detecting Image Spam Using VisualFeatures and Near Duplicate Detection[C]. In: Proceeding of the17thInternational Conference on World Wide Web, Beijing,2008:497-506.
    [10] Wang Z, Josephson W, Lv Q, et al. Filtering Image Spam with Near-DuplicateDetection[C]. In: Proceedings of the International Conference on Email andAnti-Spam,2007.
    [11] Datta R, Joshi D, Li J, et al. Image Retrieval: Ideas, Influences, and Trends ofthe New Age[J]. ACM Computing Surveys,2008,40(2):1-60.
    [12] Rui Y, Huang T S, Chang S. Image Retrieval: Past, Present, and Future[J].Journal of Visual Communication and Image Representation,1997.
    [13] Bentley J L. Multi-dimensional Binary Search Trees in Database Applications[J]. IEEE Trans. on Software Engineering,1979,5(4):333-340.
    [14] Guttman A. R-tree: A Dynamic Index Structure for Spatial Searching[C]. In:Proceedings of ACM SIGMOD International Conference on Management ofData, Boston,1984:47-57.
    [15] Weber R, Schek H J, Blott S. A Quantitative Analysis and Performance Studyfor Similarity-Search Methods in High-Dimensional Spaces[C]. In:Proceedings of the24th International Conference on Very large Data Bases,New York, USA,1998:194-205.
    [16] Datar M, Immorlica N, Indyk P, et al. Locality-Sensitive Hashing Schemebased on P-stable Distributions[C]. In: Proceedings of the20th AnnualSymposium on Computational Geometry,2004:253-262.
    [17]陈俊伟,张丽春,吕岳.基于截图内容的图片垃圾邮件过滤系统[J].智能系统学报,2008,3(5):416-422.
    [18] He P, Wen X, Zheng W. A Simple Method for Filtering Image Spam[C]. In:Proceedings of the Eighth IEEE/ACIS International Conference on Computerand Information Science,2009:910-913.
    [19] He P, Wen X, Zheng W, et al. Filtering Image Spam Using File Properties andColor Histogram[C]. In: Proceedings of the International Conference onMultimedia and Information Technology,2008:276-279.
    [20] Qu Z, Zhang Y. A New Near-Duplicate Detection System Using ObjectSemantics for Filtering Image Spam[C]. In: Proceedings of the InternationalConference on Information Management, Innovation Management andIndustrial Engineering,2009,3:607-610.
    [21] Gao Y, Choudhary A, Hua G. A Comprehensive Approach to Image SpamDetection: From Server to Client Solution[J]. IEEE Transaction onInformation Forensics and Security,2010,5(4):826-836.
    [22] Gao Y, Choudhary A, Hua G. A Nonnegative Sparsity Induced SimilarityMeasure with Application to Cluster Analysis of Spam Images[C]. In:Proceedings of the IEEE International Conference on Acoustics Speech andSignal Processing,2010:5594-5597.
    [23] Zhang C, Chen W, Chen X, et al. A Multimodal Data Mining Framework forRevealing Common Sources of Spam Images[J]. Journal of Multimedia,2009,4(5):313-320.
    [24] Chen W, Zhang C. Image Spam Clustering: An Unsupervised approach[C]. In:Proceedings of the First ACM Workshop on Multimedia in Forensics,2009:25-30.
    [25] Fumera G, Pillai I, Roli F. Spam Filtering Based on The Analysis Of TextInformation Embedded Into Images[J]. The Journal of Machine LearningResearch,2006,7:2699-2720.
    [26] Issac B, Raman V. Spam Detection Proposal in Regular and Text-based ImageEmails[C]. In: Proceedings of the IEEE Region10Conference,2006:1-4.
    [27] Youn S, McLeod D. Improved Spam Filtering by Extraction of Informationfrom Text Embedded Image E-mail[C]. In: Proceedings of the ACMSymposium on Applied Computing,2009:1754-1755.
    [28] Fumera G, Pillai I, Roli F, et al. Image Spam Filtering Using Textual andVisual Information[C]. In: Proceedings of MIT Spam Conference,2007.
    [29] Hayati P, Potdar V. Evaluation of Spam Detection and Prevention Frameworksfor Email and Image Spam: A State of Art[C]. In: Proceedings of the10thInternational Conference on Information Integration and Web-basedApplications&Services,2008:520-527.
    [30] Biggio B, Fumera G, Pillai I, et al. Image Spam Filtering Using VisualInformation[C]. In: Proceedings of the14th International Conference onImage Analysis and Processing,2007:105-110.
    [31] Biggio B, Fumera G, Pillai I, et al. Improving Image Spam Filtering UsingImage Text Features[C]. In: Proceedings of the5th International Conferenceon Email and Anti-Spam,2008.
    [32]王忠桃,岳焱,彭鑫.含倾斜文字的图像垃圾邮件过滤技术研究[J].计算机与数字工程,2010,38(5):111-112.
    [33]程红蓉,秦志光,万明成,等.图像垃圾邮件中文本区域的自动提取方法[J].解放军理工大学学报,2009,10(3):258-261.
    [34] Ma W, Tran D, Sharma D. Detecting Image Based Spam Email[J]. Advances inHybrid Information Technology,2007:168-177.
    [35] Lee H, Ng A Y. Spam Deobfuscation Using a Hidden Markov Model[C]. In:Proceedings of the2nd Conference on Email and Anti-Spam,2005.
    [36] Lee S, Jeong I, Choi S. Dynamically Weighted Hidden Markov Model forSpam Deobfuscation[C]. In: Proceedings of the International Joint Conferenceon Artificial Intelligence,2007:2523-2529.
    [37]汪心昕,何培舟,郑伟,等.图像垃圾邮件中文字区域检测提取技术综述[J].中国科技论文在线,2008:1-4.
    [38]陈又新,刘长松,丁晓青.复杂彩色文本图像中字符的提取[J].中文信息学报,2003,17(5):55-59.
    [39] Aradhye H B, Myers G K, Herson J A. Image Analysis for EfficientCategorization of Image-based Spam E-mail[C]. In: Proceedings of the EighthInternational Conference on Document Analysis and Recognition,2005:914-918.
    [40] Wu C T, Cheng K T, Zhu Q, et al. Using Visual Features for Anti-spamFiltering[C]. In: Proceedings of the IEEE International Conference on ImageProcessing,2005,3:501-504.
    [41] Byun B, Lee C H, Webb S, et al. A Discriminative Classifier LearningApproach to Image Modeling and Spam Image Identification[C]. In:Proceedings of the4th Conference on Email and Anti-Spam,2007.
    [42] Byun B, Lee C, Webb S, et al. An Anti-spam Filter Combination Frameworkfor Text-and-Image Emails through Incremental Learning[C]. In: Proceedingsof the Sixth Conference on Email and Anti-Spam,2009.
    [43] Dredze M, Gevaryahu R, Elias-Bachrach A. Learning Fast Classifiers forImage Spam[C]. In: Proceedings of the Conference on Email and Anti-Spam,2007:487-493.
    [44] Krasser S, Tang Y, Gould J, et al. Identifying Image Spam based on Headerand File Properties using C4.5Decision Trees and Support Vector MachineLearning[C]. In: Proceedings of the International Conference on InformationIntegration and Web-based Applications and Services,2008:520-527.
    [45] Soranamageswari M, Meena D C. Histogram based Image Spam Detectionusing Back Propagation Neural Networks[J]. Global Journal of ComputerScience and Technology,2010,9(5):62-67.
    [46] Nhung N P, Phuong T M. An Efficient Method for Filtering Image-based SpamE-mail[C]. In: Proceedings of the IEEE International Conference onInnovation and Vision for the Future,2007:96-102.
    [47]万明成,耿技,程红蓉,等.基于颜色与角点特征的图像垃圾邮件识别算法[J].计算机工程,2009,35(15):209-211.
    [48] Liu Q, Qin Z, Cheng H, et al. Efficient Modeling of Spam Images[C]. In:Proceedings of the3rd International Symposium on Intelligent InformationTechnology and Security Informatics,2010:663-666.
    [49] Wang C, Zhang F, Li F, et al. Image Spam Classification Based on Low-levelImage Features[C]. In: Proceedings of the IEEE International Conference onCommunications, Circuits and Systems,2010:290-293.
    [50] Gao Y, Yang M, Zhao X, et al. Image spam hunter[C]. In: Proceedings of theIEEE International Conference on Acoustics, Speech and Signal Processing,2008:1765-1768.
    [51] Gao Y, Choudhary A. Active Learning Image Spam Hunter[J]. Advances inVisual Computing,2009:293-302.
    [52] Gao Y, Yang M, Choudhary A. Semi Supervised Image Spam Hunter: ARegularized Discriminant EM Approach[J]. Advanced Data Mining andApplication,2009:152-164.
    [53]张浩然,汪晓东.回归最小二乘支持向量机的增量和在线式学习算法[J].计算机学报,2006,29(3):400-406.
    [54] Fleck M, Forsyth D, Bregler C. Finding Naked People[C]. In: Proceedings ofthe European Conference on Computer Vision,1996:593-602.
    [55] Jones M J, Rehg J M. Statistical Color Models with Application to SkinDetection[J]. International Journal of Computer Vision,2002,46(1):81-96.
    [56] Zheng Q F, Zeng W, Wang W Q, et al. Shape-based Adult Image Detection[C].In: Proceedings of the3rd International Conference on Image and Graphics,2004:150-153.
    [57] Duan L, Cui G, Gao W, et al. Adult Image Detection Method base-on SkinColor Model and Support Vector Machine[C]. In: Proceedings of the5th AsianConference on Computer Vision,2002:797-800.
    [58] Zheng H, Daoudi M, Jedynak B. Blocking Adult Images Based on StatisticalSkin Detection[J]. Electronic letters on Computer Vision and Image Analysis,2004,4(2):1-14.
    [59] Yoo S J, Jung M, Kang H, et al. Composition of MPEG-7Visual Descriptorsfor Detecting Adult Images on the Internet[J]. Web and CommunicationTechnologies and Internet-Related Social Issues,2003:682-687.
    [60] Drimbarean A F, Corcoran P M, Cuic M, et al. Image Processing Techniques toDetect and Filter Objectionable Images based on Skin Tone and ShapeRecognition[C]. In: Proceedings of the International Conference onConsumer Electronics,2001:278-279.
    [61] Zheng W, Zhang M, Wang W. A Hybrid Approach to Detect Adult WebImages[J]. Advances in Multimedia Information Processing,2005:609-616.
    [62] Abadpour A, S. Kasaei S. Pixel-based Skin Detection for PornographyFiltering[J]. Iranian Journal of Electrical and Electronic Engineering,2005,1(3):21-41.
    [63]陈家伟.基于内容的图像过滤[D].华南理工大学硕士论文,2010.
    [64] Zuo H, Hu W, Wu O, et al. Detecting Image Spam Using Local InvariantFeatures and Pyramid Match Kernel[C]. In: Proceedings of the18thInternational Conference on World Wide Web,2009:1187-1188.
    [65] Zuo H, Li X, Wu O, et al. Image Spam Filtering Using Fourier-MellinInvariant Features[C]. In: Proceedings of the IEEE International Conferenceon Acoustics, Speech and Signal Processing,2009:849-952.
    [66] Nielson J, Castro D M N, Aycock J. Image spam-ASCII to rescue![C]. In:Proceedings of the3rd International Conference on Malicious and UnwantedSoftware,2008:65-68.
    [67] Xu C, Chen Y, Chiew K. An Approach to Image Spam Filtering based onBase64Encoding and N-gram Feature Extraction[C]. In: Proceedings of the22nd IEEE International Conference on Tools with Artificial Intelligence,2010:171-177.
    [68]邓蔚,程红蓉,钱伟中,等.基于Kolmogorov复杂性的垃圾图像分类模型[J].计算机应用研究,2011,28(4):1533-1535.
    [69]郭学敏.基于语义的广告图像垃圾邮件过滤技术研究[D].燕山大学硕士论文,2010.
    [70] Kokkodis M, Faloutsos M. Spamming Botnets: Are We Losing the War?[C]. In:Proceedings of the Conference on Email and Anti-Spam,2009.
    [71]程杰仁,殷建平,刘运,等.蜜罐及蜜网技术研究进展[J].计算机研究与发展,2009,(z1):375~378.
    [72] Yang X, Zhu Q, Cheng K T. MyFinder: Near-Duplicate Detection for LargeImage Collections[C]. In: Proceedings of the ACM International Conferenceon Multimedia,2009:1013-1014.
    [73] Zhang D Q, Chang S F. Detecting Image Near-Duplicate by StochasticAttribute Relational Graph Matching with Learning[C]. In: Proceedings of theACM International Conference on Multimedia,2004:877-884.
    [74] Chum O, Philbin J, Zisserman A. Near Duplicate Image Detection: Min-Hashand tf-idf Weighting[C]. In: Proceedings of the British Machine VisionCconference,2008.
    [75] Duan M, Wu X. Visual Polysemy and Synonymy: Toward Near-DuplicateImage Retrieval[J]. Frontiers of Electrical and Electronic Engineering inChina,2010,5(4):419-429.
    [76] Zhou W, Lu Y, Li H, et al. Spatial Coding for Large Scale Partial-DuplicateWeb Image Search[C]. In: Proceedings of the ACM International Conferenceon Multimedia,2010:511-520.
    [77] Wu Z, Ke Q, Isard M, et al. Bundling Features for Large ScalePartial-Duplicate Web Image Search[C]. In: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition,2009:25-32.
    [78] Zhang Q F, Wang W Q, Gao W. Effective and Efficient Object-based ImageRetrieval Using Visual Phrases[C]. In: Proceedings of the ACM InternationalConference on Multimedia,2006:77-80.
    [79] Zhang S, Huang Q, Hua G, et al. Building Contextual Visual Vocabulary forLarge-scale Image Applications[C]. In: Proceedings of the ACM InternationalConference on Multimedia,2010:501-510.
    [80] Yuan J, Wu Y, Yang M. Discovery of Collocation Patterns: From Visual Wordsto Visual Phrases[C]. In: Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition,2007:1-8.
    [81] Yeh J B, Wu C H. Extraction of Robust Visual Phrases Using Graph Miningfor Image Retrieval[C]. In: Proceedings of the IEEE International Symposiumon Circuits and Systems,2010:3681-3684.
    [82] Hu Y, Cheng X, Chia L T, et al. Coherent Phrase Model for Efficient ImageNear-Duplicate Retrieval[J]. IEEE Trans. on Multimedia,2009,11(8):1434-1445.
    [83] Zhang Y, Jia Z, Chen T. Image Retrieval with Geometry-preserving VisualPhrases[C]. In: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition,2011:809-816.
    [84] Wu Z, Xu Q, Jiang S, et al, Adding Affine Invariant Geometric Constraint forPartial-duplicate Image Retrieval[C]. In: Proceedings of the20th IEEEInternational Conference on Pattern Recognition,2010:842-845.
    [85] Lv Q, Charikar M, Li K. Image Similarity Search with Compact DataStructures[C]. In: Proceedings of the13th ACM International Conference onInformation and Knowledge Management,2004:208-217.
    [86] Wang Y, Hou Z J, Leman K, et al. Combination of Local and Global Featuresfor Near-duplicate Detection[J] Advances in Multimedia Modeling,2011:328-338.
    [87] Zhao W L, Ngo C W. Scale-rotation Invariant Pattern Entropy for Keypoint-based Near-duplicate Detection[J]. IEEE Trans. on Image Processing,2009,18(2):412-423.
    [88] Xu D, Cham T J, Yan S, et al. Near Duplicate Image Identification withPatially Aligned Pyramid Matching[C]. In: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition,2008:1-7.
    [89] Chum O, Perdoch M, Matas J. Geometric Min-hashing: Finding a (thick)Needle in a Haystack[C]. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition,2009:17-24.
    [90] Perd'och M, Chum O, Matas J. Efficient Representation of Local Geometry forLarge Scale Object Retrieval[C]. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition,2009:9-16.
    [91] Lowe D G. Distinctive Image Features from Scale-invariant Keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
    [92] Murugappan A, Ramachandran B, Dhavachelvan P. A Survey of KeywordSpotting Techniques for Printed Document Images[J]. Artificial IntelligenceReview,2011,35(2):119-136.
    [93] Constantinopoulos C, Meinhardt-Llopis E, Liu Y, et al. A Robust Pipeline forLogo Detection[C]. In: Proceedings of the IEEE Internatioanl Conference onMultimedia and Expo,2011:1-6.
    [94] Bai S, Li L, Tan C L. Keyword Spotting in Document Images through WordShape Coding[C]. In:Proceedings of the10th International Conference onDocument Analysis and Recognition,2009:331-335.
    [95] Rath T M, Manmatha R. Word Spotting for Historical Documents[J].International Journal on Document Analysis and Recognition,2007,9(2):139-152.
    [96] Aouadi N, Afef K. Word Spotting for Arabic Handwritten Historical DocumentRetrieval Using Generalized Hough Transform[C]. In: Proceedings of the3rdInternational Conference on Pervasive Patterns and Applications,2011:67-71.
    [97] Roy P P, Ramel J, Ragot N. Word Retrieval in Historical Document UsingCharacter-Primitives[C]. In: Proceedings of the International Conference onDocument Analysis and Recognition,2011:678-682.
    [98] Rusinol M, Aldavert D, Toledo R, et al. Browsing Heterogeneous DocumentCollections by a Segmentation-free Word Spotting Method[C]. In:Proceedings of the International Conference on Document Analysis andRecognition,2011:63-67.
    [99] Leydier Y, Lebourgeois F, Emptoz H. Text Search for Medieval ManuscriptImages[J]. Pattern Recognition,2007,40(12):3552-3567.
    [100] Leydier Y, Ouji A, Lebourgeois F, et al. Towards an Omnilingual WordRetrieval System for Ancient Manuscripts[J]. Pattern Recognition,2009,42(9):2089-2105.
    [101] Gatos B, Pratikakis I. Segmentation-free Word Spotting in Historical PrintedDocuments[C]. In: Proceedings of the International Conference on DocumentAnalysis and Recognition,2009:271-275.
    [102] Llados J, Roy P P, Rodriguez J A, et al. Word Spotting in Archive DocumentsUsing Shape Contexts[C]. In: Proceedings of the Iberian Conference onPattern Recognition and Image Analysis,2007:290-297.
    [103] Thayananthan A, Stenger B, Torr P H S, et al. Shape Context and ChamferMatching in Cluttered Scenes[C]. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition,2003:127-133.
    [104] Mori G, Malik J. Recognizing Objects in Adversarial Clutter: Breaking aVisual CAPTCHA[C]. In: Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition,2003:134-144.
    [105] Fischer A, Keller A, Frinken V, et al. HMM-based word spotting inhandwritten documents using subword models[C]. In: Proceedings of theInternational Conference on Pattern Recognition,2010:316-3419.
    [106] Frinken V, Fisher A, Bunke H. A novel word spotting algorithm usingbidirectional long short-term memory neural networks[J]. Artifitial NeuralNetworks in Pattern Recognition,2010:185-196.
    [107] Matas J, Chum O, Urban M, et al. Robust Wide-Baseline Stereo fromMaximally Stable Extremal Regions[J]. Image and Vision Computing,2004,22(10):761-767.
    [108] Yang J, Jiang Y G, Hauptmann A G, et al. Evaluating Bag-of-visual-wordsRepresentations in Scene Classification[C]. In: Proceedings of theInternational Workshop on Multimedia Information Retrieval,2007:197-206.
    [109] Google Tesseract-OCR[R/OL]. http://code.google.com/p/tesseract-ocr/.
    [110] Campos T de, Babu B R. Character recognition in natural image[C]. In:Proceedings of the VISAPP,2009:273-280.
    [111] Wang K, Belongie S. Word Spotting in the Wild[C]. In. Proceedings of theEuropean Conference on Computer Vision,2010:591-604.
    [112] Berg A C, Malik J. Geometric Blur for Template Matching. In: Proceedings ofIEEE Conference on Computer Vision and Pattern Recognition,2001:607-617.
    [113] Berg A C, Berg T L, Malik J. Shape Matching and Object Recognition UsingLow Distortion Correspondences[C]. In: Proceedings of IEEE Conferenceon Computer Vision and Pattern Recognition,2005:26-33.
    [114]汉王OCR6.0[R/OL]. http://www.hanwang.com.cn.
    [115] Madevska-Bogdanova A, Nikolik D, Curfs L. Probabilistic SVM Outputs forPattern Recognition Using Analytical Geometry[J]. Neurocomputing,2004,62:293-303.
    [116] Kittler J, Hatef M, Duin R P W, et al. On Combining Classifiers[J]. IEEETrans. on PAMI,1998,20(3):226-239.
    [117] Image Spam Data Sets[R/OL]. http://prag.diee.unica.it/pra/eng/research/doccategorisation/spamfiltering/datasets.
    [118] Freeman W T, Adelson E H. The Design and Use of Steerable Filters[J]. IEEETrans. on PAMI,1991,13(9):891-906.
    [119] Gonzalez R C, Woods R E, Eddins S L. Digital Image Processing UsingMatlab[M]. Prentice Hall,2004:344-377.
    [120] Jung K, In Kim K, K Jain A. Text Information Extraction in Images and Video:A Survey[J]. Pattern Recognition,2004,37(5):977-997.
    [121] Mariano V Y, Kasturi R. Locating Uniform-colored Text in Video Frames[C].In: Proceedings of the15th International Conference on Pattern Recognition,2000:539-542.
    [122] Kim K I, Jung K, Kim J H. Texture-based Approach for Text Detection inImages Using Support Vector Machines and Continuously Adaptive MeanShift Algorithm[J]. IEEE Trans. on PAMI,2003,25(12):1631-1639.
    [123] Chen D, Odobez J M, Bourlard H. Text Detection and Recognition in Imagesand Video Frames[J]. Pattern Recognition,2004,37(3):595-608.
    [124] Chen D, Bourland H, Thiran J P. Text Identification in Complex BackgroundUsing SVM[C]. In: Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition,2001:621-626.
    [125]谢杰成,张大力,徐文立.小波域去噪综述[J].中国图象图形学报,2002,7(3):209-217.
    [126] Xu Y, Weaver J B, Healy D M, et al. Wavelet Transform Domain Filters: aSpatially Selective Noise Filtration Technique[J]. IEEE Trans. on ImageProcessing,1994,3(6):747-758.
    [127] Pan Q, Zhang L, Dai G, et al. Two Denoising Methods by WaveletTransform[J]. IEEE Trans. on Signal Processing,1999,47(2):3401-3406.
    [128] Vetterli M, Kovacevic J. Wavelets and subband coding[M]. Prentice Hall,1995:209-304.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700