垃圾图像特征提取与选择研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
垃圾图像识别是当前互联网络垃圾信息过滤研究领域的热点之一,目标是解决传统的垃圾信息过滤方法在过滤垃圾图像信息时,性能急剧下降甚至失效的问题。解决垃圾图像识别问题的关键是特征建模时采用的特征提取和特征选择方法。鉴于电子邮件是目前传播垃圾图像信息最主要的途径之一,本论文以电子邮件所含垃圾图像为研究对象,针对抗干扰的图像区域和图像边缘特征提取方法、基于信息度量准则的有监督特征选择方法、应对标注瓶颈问题的半监督特征选择方法进行了研究。本文主要的创新性成果包括以下四个方面:
     1.提出一种抗干扰的文本区域自动提取方法,削弱了现有相关方法对图像质量有较高要求的限制。该方法设计的八邻域细小区域去除算法和候选文本区域筛选机制,能有效降低复杂背景和不规整的图像文字对文本区域分割形成的干扰。在此基础上,该方法设计了一种基于霍夫变换求标记区域最小外接矩形的算法,克服了现有相关方法不能有效提取倾斜文本区域的不足。实验结果显示该方法能有效提高文本区域的提取精确度,从而获得更有效的文本区域特征。
     2.提出一种邮件图像边缘特征提取方法。该方法引入高阶局部自相关(Higher-order Local AutoCorrelation, HLAC)函数提取邮件图像的边缘特征,据此得到的HLAC特征能反映图像内容固有的边缘相关性,具有对位移和尺度变化不敏感的优点,表现出较强的抗干扰能力,克服了现有相关算法对图像边缘分布或者图像中的文字数量存在限制条件的不足。真实数据集上的实验结果证实HLAC特征是一种有效的判别特征。
     3.提出一种基于信息度量准则的特征选择算法。针对现有相关算法脱离分类环境评估冗余特征的问题,该算法提出分类冗余特征的定义,并设计了一个分类信息增益度量化指标,在评估候选特征之前删除分类冗余特征,降低对评估特征的干扰。针对大多数信息度量准则不能正确处理特征协作关系的问题,该算法运用条件互信息,设计了一个信息度量准则对特征进行评估。实验结果表明该算法能够有效降低特征空间的复杂度,提高分类模型的性能。
     4.提出一种基于图的半监督特征选择算法。该算法以聚类假设为理论基础,对基于谱图理论的无监督特征选择算法Laplacian Score进行扩展,通过构建样本数据的类内相似度和类间离散度矩阵,考察特征保持全局结构和局部结构的能力,并且利用分类信息增益度指标去除冗余特征,弥补了现有相关算法不能处理冗余特征的不足。实验结果显示该算法在样本标注程度很低的数据集上能有效去除冗余特征,选出预测力强的特征子集。
     上述研究成果为实现垃圾图像的自动判别,从而解决垃圾图像信息的过滤问题提供了新的研究思路和有希望的解决方案。
Spam image recognition is one of the hot issues in the current research area of Internet spam filtering, aimed at addressing the problem that the traditional text-based spam filtering methods may fail to discrimimate image-based spam. The approaches of feature extraction and feature selection that are used to build feature model play a critical role in solving the problem of spam image recognition. Since Email is one of the mostly used ways to deliver spam image, this dissertation focuses on the spam images in Email and studies the noise robust feature extraction for image region and edge, the information criterion based supervised feature selection, and the semi-supervised feature selection for dealing with the label bottleneck problem. The main creative results of this dissertation are the followings:
     1. A noise robust method of automatic text region extraction is proposed to mitigate the constraints on image quality. In this method, the algorithm of removing small region based on eight-neighborhood pixels and the fasle text region filtering scheme are designed to effectively reduce the noise caused by complex background and irregular image text for text segmentation. Then a Hough transform based algorithm of calculating minimum enclosing rectangle is proposed to solve the problem of extracting non-horizhontal text region. The experimental results show that the proposed method can effectively improve the extraction precision. Based on the new method, more effective features of text regions can be achieved.
     2. An algorithm is proposed to extract edge features for the images in Email. The algorithm exploits the higher-order local autocorrelation (HLAC) function to extract the features of image edage. The extracted HLAC features are inherently related to its local edge autocorrelation features and insensitive to shift-variance and scale-variance. Thus the new algorithm is noise robust without limitations respect to edge distribution and the amount of image text. The experiment results show that HLAC features are effective for spam image discrimination.
     3. A new information-criterion based feature selection algorithm is proposed. In this algorithm, classification redundant feature and its measure classification information gain are defined to solve the problem that the prevalent algorithms evaluate feature redundancy independently of the classification task at hand. Based on the measure of classification information gain, the classification redundant features in the candidate feature subset can be removed beforehand to reduce noise. Since the prevalent information criterions cannot handle feature synerge correctly, in this algorithm, a new information criterion based on conditional information is proposed to approperaitely estimate the information of feature interaction. The experimental results show that the new algorithm can effectively reduce the dimension of the feature space and improve the classification performace.
     4. A graph-based semi-supervised feature selection algorithm is proposed. This algorithm exploits the clustering assumption to extend the graph-based unsupervised feature selection algorithm, i.e. Laplacian Score. By constructing between-class scatter matrix and within-class similarity matrix, the new algorithm can evaluate the features according to their power of preserving global structure and local structure of samples. In addition, the new algorithm can remove redundant features based on exploiting classification information gain. Thus it solves the problem that the popular algorithms based on score function cannot deal with redundant features. The experimental results show that the new algorithm can effectively reduce redundancy of the feature space and improve the classification performace when the labeled samples in the data set are lack.
     The findings and conclusions proposed above will provide a new perspective for automatic spam image recognition and will encourage promising investigations along the lines suggested.
引文
[1] Cisco. Cisco 2009 annual security report. Cisco System Inc., 2009
    [2] Wikipedia Foundation Inc. Image spam. http://en.wikipedia.org/wiki/Image_spam
    [3] N. Kelly. Image spam: the new Email scourge. McAfee Inc, 2006
    [4]中国反垃圾邮件联盟.国际垃圾邮件及反垃圾最新技术. http://anti-spam.org.cn/AID/792
    [5]中国反垃圾邮件联盟.赛门铁克:全球40%垃圾邮件来自Rustock僵尸网络. http://www.anti-spam.org.cn/AID/822, 2010
    [6] B. Mehta, S. Nangia, M. Gupta, et al. Detecting image spam using visual features and near duplicate detection. Proceedings of the 17th international conference on World Wide Web (WWW'08), 2008, 497-506
    [7] H. Zuo, W. Hu, O. Wu, et al. Detecting image spam using local invariant features and pyramid match kernel. Proceedings of the 18th international conference on World Wide Web (WWW'09), 2009, 1187-1188
    [8] G. Fumera, I. Pillai, F. Roli, et al. Image spam filtering using textual and visual information. Proceedings of MIT Spam Conference, 2007
    [9] G. Fumera, I. Pillai, F. Roli. Spam filtering based on the analysis of text information embedded into images. Journal of Machine Learning Research, 2006, 7(2006): 2699-2720
    [10] S. Mori, H. Nishida, H. Yamada. Optical character recognition. New York: John Wiley&Sons, 1999
    [11] B. Issac, V. Raman. Spam detection proposal in regular and text-based image Emails, Proceedings of IEEE International TENCON Conference, 2006, 1-4
    [12] J.S. Kim, S.H. Kim, H.J. Yang, et al. Text extraction for spam-mail image filtering using a text color estimation technique. LNAI 4570,Springer-Verlag Berlin Heidelberg, 2007, 105–114
    [13] H. Huang, W. Guo, Y. Zhang. A novel method for image spam filtering. Proceedings of the 9th International Conference for Young Computer Scientists (ICYCS'08), 2008, 826-830
    [14] N. Ezaki, M. Bulacu, L. Schomaker. Text detection from natural scene images: towards a system for visually impaired persons. Proceedings of 17th International Conference on Pattern Recognition (ICPR' 04), vol. II:683-686
    [15] M. R. Lyu, J. Song, M. Cai. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(2): 243-255
    [16] L. V. Ahn , M. Blum , N. J. Hopper, et al. CAPTCHA. Carnegie Mellon University, 2000
    [17] S. Krasser, Y. Tang, J. Gould, et al. Identifying image spam based on header and file properties using C4.5 decision trees and support vector machine learning. Proceedings of the 2007 IEEE Workshop on Information Assurance, 2007, 255-261
    [18] M. Uemura, T. Tabata. Design and evaluation of a Bayesian-filter-based image spam filtering method. Proceedings of 2008 International Conference on Information Security and Assurance, 2008, 46-51
    [19] J. Dong, Z. Yuan, Q. Zhang, et al. A novel anti-spam scheme for image-based Email. Proceedings of the First International Symposium on Data, Privacy and E-Commerce, 2007
    [20] B. Byun, C. H. Lee, S. Webb, et al. A discriminative classifier learning approach to image modeling and spam image identification. Proceedings of the 4th conference on email and anti-spam (CEAS'07), 2007
    [21] M. Dredze, R. Gevaryahu, A. E. Bachrach. Learning fast classifiers for image spam. Proceedings of the 4th conference on email and anti-spam (CEAS'07), 2007
    [22] P. He, X. Wen, W. Zheng, et al. Filtering image spam using file properties and color histogram. Proceedings of 2008 International Conference on MultiMedia and Information Technology, 2008, 276-279
    [23] P. He, X. Wen, W. Zheng. A simple method for filtering image spam. Proceedings of 2009 IEEE/ACIS International Conference on Computer and Information Science, 2009, 910-913
    [24] C. Wu, K. Cheng, Q. Zhu, et al. Using visual features for anti-spam filtering. Proceedings of IEEE International Conference on Image Processing (ICIP'05), 3, 509-512
    [25] H. B. Aradhye, G. K. Myers, J. A. Herson. Image analysis for efficient categorization of image-based spam E-mail. Proceedings of the 8th International Conference on Document Analysis and Recognition, 2005, 2, 914-918
    [26] C. Frankel, M. Swain, V. Athitsos. Webseer: an image search engine for the world wide web. Univ. of Chicago Technical Report TR96-14, 1996
    [27] B. Biggio, G. Fumera, I. Pillai, et al. Image spam filtering by content obscuring detection. Proceedings of the 4th Conference on Email and Anti-Spam (CEAS'07), 2007
    [28] B. Biggio, G. Fumera, I. Pillai, et al. Image spam filtering using visual information. Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP' 07), 2007, 105-110
    [29] D. G. Pelli, C. W. Burns, B. Farell, et al. Feature detection and letter identification. Vision Research, 2006, 46:4646-4674
    [30] Z. Wang, W. Josephson, Q. Lv, et al. Filtering image spam with near-duplicate detection. Proceedings of the 4th Conference on Email and Anti-Spam (CEAS'07), 2007
    [31] A. Dempster, N. Laird, D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, 39(1): 1-38
    [32] J. Matas, O. Chum, M. Urban, et al. Robust wide-baseline stereo from maximally stable extremal regions. British Machine Vision Conference (BMVC), 2002, 384-393
    [33] H. Bay, T. Tuytelaars, L. V. Gool. SURF: speeded up robust features. Computer Vsion and Image Understanding (CVIU), 2008, 110(3): 346-359
    [34] H. Zuo, X. Li, O. Wu, et al. Image spam filtering using Fourier-mellin invariant features. Proceedings of 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, 849-852
    [35] J. Nielson, D. Castro, J. Aycock. Image spam– ASCII to the rescue! Proceedings of 2008 3rd International Conference on Malicious and Unwanted Software, 2008, 65-68
    [36] C. Xu, Y. Chen, K. Chiew. An approach to image spam filtering based on base64 encoding and N-Gram feature extraction. Proceedings of the 22nd International Conference on Tools with Artificial Intelligence (ICTAI 2010), 2010
    [37]程红蓉,秦志光,万明成,等.垃圾图像判别中的特征提取与选择研究.计算机应用研究,2009, 26(6): 2001-2003
    [38] R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification (2nd Edition). New York: John Wiley&Sons, 2001
    [39] P. Langley, W. Iba, K. Thompson. An analysis of Bayesian classifiers. Proceedings of the 10th National Conference on Artificial Intelligence, 1992, 223-228
    [40] A. K. Jain, D. Zongker. Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(2): 153-158
    [41] P. Langley. Selection of relevant features in machine learning. Proceedings of the AAAI Fall Symposium on Relevance, 1994, 140-144
    [42] R. Bellman. Adaptive control processes: a guided tour. Princeton: Princeton University Press, 1961
    [43] A. K. Jain, R. P. Duin, W. J. Mao. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(1): 4-37
    [44] T. Hastie, R. Tibshirani, J. Friedman. The elements of statistical learning. Berlin-Heidelberg: Springer, 2001
    [45]毛勇,周晓波,夏铮,等.特征选择算法研究综述.模式识别与人工智能,2007,20(2): 211-218
    [46]刘成林,谭铁牛.模式识别研究进展. http://wendang.baidu.com/view/0ff616fafab069dc5022019d.html, 2007
    [47] H. Liu, H. Motoda. Feature selection for knowledge discovery and data mining. Norwell: Kluwer Academic Publishers, 1998
    [48] H. Liu , L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 491-502
    [49] K. Kira, L. Rendell. A practical approach to feature selection. Proceedings of the 9th International Conference on Machine Learning, 1992, 249-256
    [50] I. Kononenko. Estimation attributes:analysis and extensions of RELIEF. Proceedings of the European Conference on Machine Learning, 1994, 171-182
    [51] Y. Sun. Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 1035-1051
    [52] M. Dash, H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 2003, 151, 155-176
    [53] H. L. Wei, S. A. Billings. Feature subset selection and ranking for data dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1): 162-166
    [54] T. M. Cover, J. A. Thomas. Elements of Information Theory (2nd Edition). New York:John Wiley&Sons, 2006
    [55] G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 2003, 3(2003): 1289-1305
    [56] M. A. Hall, G. Holmes. Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 1041-4347
    [57] H. Liu , L. Liu , H. Zhang. Feature selection using mutual information: an experimental study. Proceedings of the 10th Pacific Rim Internernational Confonerence on Artificial Intelligence, LNAI 5351, 2008, 235-246
    [58] R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 1994, 5(4): 537-550
    [59] H. Peng, F. Long, C. Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238
    [60] N. Kwak, C. H. Choi. Input feature selection for classification problems. IEEE Transactions on Neural Networks, 2002, 13(1): 143-159
    [61] J. Novovicova, P. Somol, M. Haindl, et al. Conditional mutual information based feature selection for classification task. LNCS 4756, 2007, 417-426
    [62] L. Yu, H. Liu. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004, 5:1205-1224
    [63] P. Mitra. Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24:173-182
    [64] C. M. Christoudias, R. Urtasun, T. Darrell. Unsupervised feature selection via distributed coding for multi-view object recognition. Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08), 2008, 1-8
    [65] C. Boutsidis, M. W. Mahoney, P. Drineas. Unsupervised feature selection for principal components analysis. Proceedings of the 14th Annual ACM SIGKDD Conference, 2008, 61-69
    [66] Z. Zhao, H. Liu. Semi-supervised feature selection via spectral analysis. Proceedings of SIAM, 2007, 641-646
    [67] D. S. Yeung, J. Wang, W. W. Y. NG. IPIC separability ratio for semi-supervised feature selection. Proceedings of the 8th International Conference on Machine Learning and Cybernetics, 2009
    [68] J. R. Bowling, P. Hope, K. J. Liszka. Spam image identification using an artificial neural network. Proceedings of the 5th conference on email and anti-spam (CEAS'08), 2008
    [69] M. Soranamageswari, C. Meena. Statistical feature extraction for classification of image spam using artificial neural networks. Proceedings of the 2nd International Conference on Machine Learning and Computing, 2010, 101-105
    [70] M. Soranamageswari, C. Meena Histogram based image spam detection using back propagation neural networks. Global Journal of Computer Science and Technology, 2010, 9(5): 62-67
    [71] N. P. Nhung, T. M. Phuong. An efficient method for filtering image-based spam E-mail. Proceedings of the 12th International Conference on Computer Analysis of Images and Patterns (CAIP'07), 2007, LNCS 4673, 945-953
    [72] Y. Gao, M. Yang, X. Zhao. Image Spam Hunter. IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, 1765-1768
    [73] B. Byun, C.H. Lee, S. Webb, et al. An anti-spam filter combination framework for text-and-Image Emails through incremental learning. Proceedings of the 6th conference on email and anti-spam (CEAS'09), 2009
    [74] Y. Gao, A. Choudhary. Active learning image spam hunter. Proceedings of International Symposium on Visual Computing, (ISVC'09), 2009, Part II, LNCS 5876, 293-302
    [75] Y. Gao, M. Yang, A. Choudhary. Semi-supervised image spam hunter- a regularized discrminant EM approach. Proceedings of Advanced Data Ming and Applications (ADMA'09), 2009, LNAI 5678, 152-164
    [76] N. Cristianini, S. T. John.支持向量机导论.北京:电子工业出版社, 2004
    [77] C. C. Chang, C. J. Lin. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
    [78] M. Sewell. winSVM. http://www.cs.ucl.ac.uk/staff/M.Sewell/winsvm/, 2005
    [79] SVM-light. http://svmlight.joachims.org/
    [80] J. R. Quinlan. C4.5: Programs for machine learning. San Francisco: Morgan Kaufmann, 1993
    [81] J. H. Hsia, M. S. Chen. Language-model-based detection cascade for efficient classification of image-based spam Email. Proceedings of 2009 IEEE International Conference on Multimedia & Expo (ICME’09), 2009, 1182-1185
    [82]许洋洋,袁华.一种基于内容的图像垃圾邮件过滤方法.山东大学学报(理学版), 2006, 41(3): 37-42
    [83]段瑞玲,李庆祥,李玉和.图像边缘检测方法研究综述.光学技术, 2005, 31(3): 415-419
    [84]刘峤,秦志光,程红蓉,等.基于颜色和边缘特征直方图的图像型垃圾邮件分类模型.计算机应用研究, 2010, 27(7): 2608-2610
    [85] M. Dash, H. Liu. Feature selection for classification. Intelligent Data Analysis, 1997, 1(1997): 131-156
    [86] K. Glocer, D. Eads, J. Theiler. Online feature selection for pixel classification. Proceedings of the 22nd international conference on Machine learning (ICML'05), 2005
    [87] E. Xing, M. Jordan, R. Karp. Feature selection for high-dimensional genomic microarray data. Proceedings of the 8th International Conference on Machine Learning (ICML'01), 601-608
    [88] Y. Saeys, I. Inza, P. Larranaga. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, 23(19): 2507-2517
    [89] W. Lee, S. J. Stolfo, K. W. Mok. Adaptive intrusion detection: a data mining approach. Artificial Intelligence Review, 2000, 14(6): 533-567
    [90] S. Davies, S. Russell. NP-completeness of searches for smallest possible feature sets. Proceedings of 1994 AAAI Fall Symposium on Relevance, 37-39
    [91] R. Kohavi, G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1-2): 273-324
    [92] H. Liu, L. Yu. Toward integrating feature selection algorithms for classification and clustering IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 491-502
    [93] C. E. Shannon, W. Weaver. The mathematical theory of communication. Urbana: University of Illinois, 1949
    [94] J. Huang, Y. Cai, X. Xu. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognition Letters, 2007, 28, 1825-1844
    [95] J. Huang, L. Ning, S. Li, et al. Feature selection for classificatory analysis based on information-theoretic criteria. ACTA AUTOMATICA SINICA, 2008, 34(3): 383-392
    [96] P. A. Estévez, M. Tesmer, C. A. Perez, et al. Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 2009, 20(2): 189-201
    [97] G. Qu, S. Hariri, M. Yousif. A new dependency and correlation analysis for features. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(9): 1199-1207
    [98] J. Ren, Z. Qiu, W. Fan, et al. Forward Semi-supervised feature selection. Advances in Knowledge Discovery and Data Mining, 2008, 970-976
    [99]王博,贾焰,田李.基于类标号扩展的半监督特征选择算法.计算机科学, 2009, 36(10): 189-208
    [100] I. Quinzán, J. M. Sotoca, F. Pla. Clustering-based feature selection in semi-supervised problems. Proceedings of the 9th international conference on Intelligent Systems Design and Applications, 2009, 535-540
    [101] F. Chung. Spectral graph theory. Proceedings of CBMS Regional Conference Series in Mathematics, 1997
    [102]王磊,刘艳.基于约束Laplacian分值的半监督特征选择算法.吉林大学学报(信息科学版), 2010, 28(4): 404-409
    [103] Z. Zhao, H. Liu. Spectral feature selection for supervised and unsupervised learning Proceeding of the 24th International Conference on Machine Learning, 2007, 1151~1157
    [104]彭岩,张道强.半监督典型相关分析算法.软件学报, 2008, 19(11): 2822-2832
    [105] K. Jung, K. J. Anil. Hybrid approach to efficient text extraction in complex color images. Pattern Recognition Letters, 2004, 25(1): 679-699
    [106] M.T. Céline, B. Gosselin. Color text extractopm from camera-based images the impact of the choice of the clustering distance. Proceedings of the 2005 International Conference on Document Analysis and Recognition (ICDAR’05), 312-316
    [107] M.T. Céline. Spatial and color spaces combination for natural scene text extraction. Proceedings of the 2006 International Conference on Image Processing (ICIP’06), 985-988
    [108]程红蓉,秦志光,万明成,等.图像型垃圾邮件中文本区域的自动提取方法.解放军理工大学学报(自然科学版), 2009, 10(3): 258-261
    [109]张引,潘云鹤.复杂背景下文本提取的彩色边缘检测算子设计.软件学报, 2001, 12(8): 1229-1235
    [110] K. Yamamoto, I. Ishii. A design of higher order auto-correlation vision chip. IEICE Transactions on Information.System, 2003, J86-D-II(8): 1205-1211
    [111] Raviv, J. A. McLaughlin and J. Nth-order autocorrelations in pattern recognition. Information and Control, 1968, 12:121-142
    [112] T. Toyoda, O. Hasegawa. Extension of higher order local autocorrelation features. Pattern Recognition, 2007, 40: 1466-1473
    [113] H. Cheng, Z. Qin, Q. Liu, M. Wan. Spam image discrimination using support vector machine based on higher-order autocorrelation feature. Proceedings of 2008 IEEE International Conferences on CIS&RAM, 2008, 1017-1021
    [114] S. Rping. mySVM-Manual. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/, 2000
    [115] A. Jakulin, I. Bratko. Analyzing attribute dependencies. Prococeedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'03), 229-240
    [116] A. Jakulin, I. Bratko. Testing the significance of attribute interactions. Proceedings of the 21th International Conference on Machine Learning, 2004, 52-59
    [117] Z. Zheng, H. Liu. Searching for interacting features. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), 2007, 371-378
    [118] A. Jakulin, I. Bratko. Quantifying and visualizing attribute interactions: an approach based on entropy. http://arxiv.org/abs/cs.AI/0308002 v3, 2003
    [119] R. Duangsoithong, T. Windeatt. Relevant and redundant feature analysis with ensemble classification. Proceedings of Advances in Pattern Recognition (ICAPR), 2009, 247-250
    [120] H. Cheng, Z. Qin, C. Feng, Y. Wang, et al. Conditional mutual information based feature selection for analyzing synergy and redundancy. ETRI Journal, 2011, 33(2): 210-218
    [121] A. Frank, A. Asuncion. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, 2010
    [122] H. Peng. mRMR sample data sets. http://penglab.janelia.org/proj/mRMR/test colon s3.csv
    [123] N. Kwak, C. H. Choi. Input feature selection by mutual information based on parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(12): 1667-1671
    [124] U. M. Irani, K. B. Fayyad. Multi-interval discretization of continuous-valued attributes for classification Learning. Proceedings of the 13th International Joint Conference on Artifical Intelligence 1993, 1022-1027
    [125] I. H. Witten, E. Frank. Data mining-pracitcal machine learning tools and techniques with JAVA implementations (2nd Edition). San Francisco: Morgan Kaufmann Publishers, 2005
    [126] E.M. Yasser, V. Honavar. WLSVM: integrating LibSVM into weka environment. http://www.cs.iastate.edu/~yasser/wlsvm
    [127] Machine Learning Group. WEKA. http://www.cs.waikato.ac.nz/ml/weka/, the University of Walkato
    [128] G. J. McLachlan, K. A. Do, C. Ambroise. Analyzing microarray gene expression data. New York: John Wiley&Sons, 2004
    [129] Princeton Spam Image Benchmark. http://www.cs.princeton.edu/cass/spam/
    [130] J. Zhao, K. Lu, X. He. Locality sensitive semi-supervised feature selection neurocomputing, 2008, 71(10-12): 1842-1849
    [131] X. He, D. Cai, P. Niyogi. Laplacian score for feature selection. Proceedings of 2006 International Conference on Advances in Neural Information Processing Systems, 18:507-514
    [132] M. Belkin, P. Niyogi. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. Proceedings of Advances in Neural Information Processing Systems, 2002, 14:585-591
    [133] X. He, P. Niyogi. Locality preserving projections. Proceedings of Advances in Neural Information Processing Systems (NIPS), 16, 2004
    [134] O. Chapelle, J. Weston, B. Sch?lkopf. Cluster kernels for semi-supervised learning. Proceedings of Advances in Neural Information Processing Systems (NIPS), 2003, 15: 585-592
    [135] D. Zhang, Z. Zhou, S. Chen. Semi-supervised dimensionality reduction. Proceedings of the 7th SIAM International Conference on Data Ming, 2007
    [136] X. Zhu. Semi-supervised learning literature survey. Computer Sciences TR 1530, University of Wisconsin-Madison, 2007
    [137] Z. Li, J. Liu, S. Chen, et al. Noise robust spectral clustering. Proceedings of IEEE 11th International Conference on Computer Vision (ICCV’07), 2007
    [138] H. Cheng, W. Deng, C. Fu, et al. Graph-based semi-supervised feature selection with application to automatic spam image identification Proceedings of Communications in Computer and Information Science (CCIS), 2011, in press
    [139]万明成,耿技,程红蓉,等.基于颜色与角点特征的图像垃圾邮件识别算法.计算机工程, 2009, 35(15): 209-211
    [140] A. Dries, U. Rückert. Adaptive concept drift detection. Statistical Analy Data Mining, 2009, 2(5-6): 311-327

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700