高维空间非训练类例外模式可拒绝分类算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
经典分类模型总是假定测试样本属于训练类之一,然而在实际应用中往往存在非训练类例外模式作为输入的情况,这时由于分类器缺乏拒识能力,只能给出错误识别结果。可见,设计可拒绝分类模型有着重要的意义。
     在可拒绝分类问题中,由于搜集非训练类样本较为困难,所以通常假设训练阶段没有非训练类样本参与。这时,构建高维空间同类事物分布的合理覆盖模型,再判断测试样本是否在覆盖体内成为解决可拒绝分类问题的关键。本文以此为出发点,针对一些新的可拒绝分类模型展开研究。
     依据区分和认识相结合的设计思路,提出基于SRM(Structural Risk Minimization)自组织多区域覆盖的可拒绝近邻分类算法。该算法根据结构风险最小化原则对训练类构造自组织多区域多球覆盖认识模型,并利用k近邻综合策略构造区分模型。实验结果验证了该算法的有效性。
     根据同类样本分布在同一个非线性流形上的假设,研究基于稀疏表示结合流形子空间覆盖的可拒绝分类算法。通过在非线性流形上寻找局部线性模块,构建训练类的紧致覆盖模型,再利用稀疏表示策略构建不同类别的区分性描述。该方法取得良好的识别效果。
     为了在加强训练样本区分性描述的基础上构建样本分布的合理覆盖,提出基于区分性投影结合最小L1球覆盖的可拒绝分类算法。该算法通过L1范数最大化主成分分析提取样本的区分性投影特征,并在特征空间建立对离群点具有良好鲁棒性的最小L1球覆盖模型,提高了分类器的性能。
     在样本较少的情况下,统计可拒绝分类方法难以对样本分布建立紧致覆盖。为此,研究基于高维空间最小生成树覆盖模型的可拒绝分类算法,该算法将最小生成树的边作为虚拟样本以提供更好的类别分布信息,并通过引入覆盖半径调整策略解决因不合理虚拟样本造成覆盖冗余的问题。
In the conventional classification problems, a typical assumption made during the design phase is that a new test object always belongs to one of a set of known classes. However, in many practical applications, outliers may appear that were not present during the training, which leads to wrong recognition results. Thus, it makes good sense to design a classification model with reject option.
     The classification problem with reject option usually assumed that no outlier samples are available in the training process. The reason for this assumption is that outliers may occur occasionally or their measurements might be very costly. In this case, finding an appropriate covering model for training class in high-dimensional space based on the complex geometric distribution of samples is the key problem of the above system. Then one point can be classified correctly by determining whether it is in the coverage area. Based on the idea, some novel classification models with reject option are presented in this paper.
     In order to combining“matter description”with“matter separation”in classification model design, a nearest neighbor classifier with reject option based on structural risk minimization self-organization multiple region covering model is presented in this paper. The algorithm construct a recognization based self-organization multiple region covering model for training data to reject outlier classes, according to the structural minimization principle. Then, the k-NN distinguish is as a following step to identified the exact class for accepted pattern. Experimental results demonstrate the effectiveness of the classifier.
     According to the assumption that the samples in each class can be supposed to distribute on a nonlinear manifold, a novel classifier with reject option based on manifold subspace covering model is constructed in this paper. Firstly, a compact coverage is built for the training samples by searching a collection of local linear models, each depicted by a subspace, on nonlinear manifold to describe the training class. Then, the SRC (Sparse Representation Classifier) is used for classification. The experiments show good performance of this method.
     In order to constuct a more compact coverage model by strengthening the discriminate description between training samples, a classifier with reject option based on minimum L1-ball covering model and discrimination feature description is proposed in this paper, which replaces L2 norm of hyperspherical covering algorithm with L1 norm. The algorithm extracts the discrimination projection feature of training samples by L1-norm maximization principal component analysis. Then, the minimum L1-ball covering model in feature space is constructed, which could improve the performance of a classifier.
     For small sample size problem, conventional classifiers with reject option based on statistical model could not construct appropriate covering decision boundary on data description. In this case, a novel minimum spanning tree (MST) covering model based classifier with reject option is proposed in this paper according to the data distribution in high-dimensional space. The algorithm describes the target class using MST with the assumption that the edges of the graph are also basic elements of the classifier which offers additional virtual training data for a better coverage. Furthermore, in order to reduce the degradation of the rejection performance due to the existence of unreasonable additional virtual training data, an adjustable coverage radius strategy is presented in coverage construction.
引文
1边肇祺,张学工等.模式识别.北京:清华大学出版社,1992:20-55
    2 C. W. Yen, C. N. Young, M. L. Nagurka. A False Acceptance Error Controlling Method for Hyperspherical Classifiers. Neurocomputing, 2004,57:295-312
    3 I. Steinwart, D. Hush, C. Scovel. A Classification Framework for Anomaly Detection. Journal of Machine Learning Research, 2005,6(2):211-232
    4 L. Zhang, R. H. Li. Designing of Classifiers Based on Immune Principles and Fuzzy Rules. Information Sciences, 2008,178(7):1836-1847
    5 J. Suutala, J. Roning. Methods for Person Identification on a Pressure Sensitive Floor: Experiments with Multiple Classifiers and Reject Option. Information Fusion, 2008,9(1):21-40
    6 D. M. J. Tax. One-class Classification. Ph.D. Thesis, Delft University of Technology, 2001:1-84
    7 M. Lauer. A Mixture Approach to Novelty Detection using Training Data with Outliers. In Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany, 2001:300-311
    8 J. Nunez Garcia, Z. Kutalik, K. H. Cho, et al. Level Sets and Minimum Volume Sets of Probability Density Functions. Journal of Approximate Reasoning, 2003,34(1):25-47
    9 D. M. J. Tax, R. P. W. Duin. Support Vector Data Description. Machine Learning, 2004, 54(1):45-56
    10 K. Lee, D. W. Kim, K. H. Lee, et al. Density-induced Support Vector Data Description. IEEE Transactions on Neural Networks, 2007,18(1):284-289
    11 S. M. Guo, L. C. Chen, J.S.H. Tsai. A Boundary Method for Outlier Detection Based on Support Vector Domain Description. Pattern Recognition, 2009,42(1):77-83
    12朱孝开,杨德贵.基于推广能力测度的多类SVDD模式识别方法.电子学报,2009, 37(3):464-469
    13 B. Amit, B. Philippe, D. Chris. A Support Vector Method for Anomaly Detection inHyperspectral Imagery. IEEE Transactions on Geoscience and Remote Sensing. 2006, 44(8),2282-2291
    14 B. Schólkopf, R. Williamson, A. Smola, et al. Support Vector Method for Novelty Detection. Neural Information Processing Systems, 2000:582-588
    15王守觉.多维空间仿生信息学入门.北京:国防工业出版社,2008:1-30
    16王守觉.仿生模式识别(拓扑模式识别)——一种模式识别新模型的理论与应用.电子学报,2002,30(10):1417-1420
    17吴涛,张铃,张燕平.机器学习中的核覆盖算法.计算机学报,2005,28(8):1295-1301
    18 Q. Tao, G. W. Wu, J. Wang. A New Maximum Margin Algorithm for One-class Problems and Its Boosting Implementation. Pattern Recognition, 2005,38(7):1071-1077
    19 Q. He, Z. Z. Shi, L. A. Ren, et al. A Novel Classification Method Based on HyperSurface. International Journal of Mathematical and Computer modeling, 2003,38: 395-407
    20 Q. He, X. Zhao, Z. Z. Shi. Classification Based on Dimension Transposition for High Dimension Data. Soft Computing, 2007,11(4):329-334
    21 M. F. Jiang, S. S. Tseng, C. M. Su. Two-phase Clustering Process for Outliers Detection. Pattern Recognition Letters, 2001,22(6-7):691-700
    22 S. Marsland. On-line Novelty Detection through Self-organisation, with Application to Inspection Robots. Ph.D. Thesis, University of Manchester, 2001:25-68
    23 D. J. Field. Relations between the Statistics of Natural Images and the Response Properties of Cortical Cells. Optical Society of America, 1987,4(12):2379-2394
    24 J. G. Daugman. Entropy Reduction and Decorrelation in Visual Coding by Oriented Neural Receptive Fields. IEEE Transactions on Biomedical Engineering, 1989,36(1): 107-114
    25 B. A. Olshausen, D. J. Field. Emergence of Simple-cell Receptive Field Properties by Learning a Sparse code for Natural Images. Nature, 1996,381:607-609
    26 B. A. Olshausen, D. J. Field. Sparse Coding with an Overcomplete Basis Set: a Strategy Employed by V1? Visual Research, 1997,37(23):3311-3325
    27 J. Wright, A. Y. Yang, A. Ganesh, et al. Robust Face Recognition via SparseRepresentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2):210-227
    28 R. Duda, P. Hart. Pattern Classification and Scene Analysis. New York: Wiley & Sons, 1973:30-35
    29赵峰,张军英,刘敬,等.基于非参数化概率密度估计的雷达目标识别.电子与信息学报,2008,30(7):1740-1743
    30 S. Haykin著.叶世伟,史忠植译.神经网络原理.北京:机械工业出版社,2004:110-181
    31 R. O. Duda, P. E. Hart, D. G. Stork著.李宏东,姚天翔等译.模式分类(第二版).北京:机械工业出版社,2003:25-70
    32张银兵,赵俊渭,李金明,等.基于RENYI熵的水声信道判决反馈盲均衡算法研究.电子与信息学报,2009,31(4):911-915
    33 M. Kraaijveld, R. Duin. A Criterion for the Smoothing Parameter for Parzen-estimators of Probability Density Functions. Technical Report, Delft University of Technology, 1975:1-3
    34 A. Ypma, R. Duin. Support Objects for Domain Approximation. International Conference on Artificial Neural Networks, Springer, Berlin, 1998:719-724
    35 P. Juszczak. Learning to recognize. A Study on One-class Classification and Active Learning. Ph.D. Thesis, Delft University of Technology, 2006:13-46
    36 A. Rabaoui, H. Kadri, Z. Lachiri, et al. One-class SVMs Challenges in Audio Detedtion and Classification Applications. EURASIP Journal on Advances in Signal Processing, 2008, 2008:14
    37赵莹,高隼,汪贵荣,等.一种新的广义近邻方法研究.电子学报,2004,32(12):196-199
    38 J. B. D Cabrera. On the Impact of Fusion Strategies on Classification Errors for Large Ensembles of Classifiers. Pattern Recognition, 2006,39(11):1963-1978
    39娄震,金忠,杨静宇.基于类条件置信变换的后验概率估计方法.计算机学报,2005, 28(1):18-26
    40 J. Ho, M. Yang, J. Lim, et al. Clustering Appearances of Objects under Varying Illumination Conditions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2003,1(1):11-18
    41 S. Li, J. Lu. Face Recognition Using the Nearest Feature Line Method. IEEE Transactions on Neural Networks, 1999,10(2):439-443
    42 R. P. Wang, S. G. Shan, X. L. Chen, et al. Manifold-Manifold Distance with Application to Face Recognition Based on Image Set. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, 2008:1-8
    43公茂果,焦李成,马文萍,等.基于流形距离的人工免疫无监督分类与识别算法.自动化学报,2008,34(3):367-375
    44 J. Tenenbaum, V. Silva, J. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 2000,290(5500):2319-2323
    45孙玉宝,肖亮,韦志辉,等.基于Gabor感知多成分字典的图像稀疏表示算法研究.自动化学报,2008,34(11):1379-1387
    46 E. Amaldi, V. Kann. On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems. Theoretical Computer Science, 1998,209(1): 237-260
    47 D. Donoho. For Most Large Underdetermined Systems of Equations, the Minimal L1-norm Near-Solution Approximates the Sparsest Near-Solution. Communications on Pure and Applied Mathematics, 2006,59(6):797-829
    48 S. Chen, D. Donoho, M. Saunders. Atomic Decomposition by Basis Pursuit. Society for Industrial and Applied Mathematics Journal on Scientific Computing, 1998,20(1):33-61
    49 V. Choulakian. L1-norm Projection Pursuit Principal Component Analysis. Computational Statistics & Data Analysis, 2006,50(6):1441-1451
    50 C. Ding, D. Zhou, X. He, et al. R1-PCA: Rotational Invariant L1-norm Principal Component Analysis for Robust Subspace Factorization. In Proceedings of IEEE International Conference on Machine Learning, Pittsburgh, USA, 2006,281-288
    51 Nojun Kwak. Principal Component Analysis Based on L1-norm Maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008,30(9):1672-1680
    52 M. B. Gulmezoglu, V. Dzhafarov, A. Barkana. A Novel Approach to Isolated Word Recognition. IEEE Transactions on Speech and Audio Processing, 1999,7(6):620-628
    53 M. B. Gulmezoglu, V. Dzhafarov, A. Barkana. The Common Vector Approach and ItsRelation to Principal Component Analysis. IEEE Transactions on Speech and Audio Processing, 2001,9(6):655-662
    54 H. Cevikalp, M. Neamtu, M. Wilkes, et al. Discriminative Common Vectors for Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005,27(1):4-13
    55文颖,施鹏飞.一种基于共同向量结合2DPCA的人脸识别方法.自动化学报,2009, 35(2):202-205
    56 P. Juszczak, D. M. J. Tax, E. Pekalska, et al. Minimum Spanning Tree Based One-class Classifier. Neurocomputing, 2009,72(7-9):1859-1869
    57 J. X. Lin, D. Y. Ye, C. C. Chen, et al. Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications. Rough Sets and Knowledge Technology, 2008, 5009:508-515
    58王守觉,徐健,王宪保,等.基于仿生模式识别的多镜头人脸身份确认系统研究.电子学报,2003,31(1):1-3
    59淦文燕,李德毅,王建民.一种基于数据场的层次聚类方法.电子学报,2006,34(2): 258-262

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700