基于决策树统合方法的最小最大模块化网络及其在专利分类中的运用

英文题名：Module Combination Based on Decision Tree in Min-max Modular Network with Application To Patent Classification
作者：王玥
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：最小最大模块化支持向量机 ; 大规模机器学习 ; 并行机器学习 ; 专利分类 ; 分类器选择算法 ; 决策树
英文关键词：Min-Max Modular SVM ; large scale machine learning ; parallel machine learning ; patent classification ; classifier selection ; decision tree
学位年度：2010
导师：吕宝粮
学科代码：081202
学位授予单位：上海交通大学

摘要

超大规模的模式识别问题是现在机器学习算法在实际应用中遇到的越来越多的一个难题。随着信息时代的到来,现实中这种大规模问题是很常见的,例如专利分类问题。即便是像支持向量机这样高效率的学习算法,面对超大规模的分类问题,也是难以克服的。在这种情况下,利用丰富的计算资源,使得机器学习并行化,是当前机器学习领域的一个重要发展方向。
     最小最大模块化支持向量机(M3-SVM)是基于“分而治之”的思想解决大规模问题的有效的学习算法。它通过分解大规模问题,使之转化为大量小规模问题进行学习,并通过有效的分类器集成方法将它们重新组合成为大规模问题的原解,该算法具有天生的并行适应性。
     为了降低M3-SVM在模块统合阶段的时间复杂度,我们在原有的非对称选择和对称选择等分类器选择方法的基础上,提出了基于决策树的分类器选择算法。实验证明,决策树选择算法在分类效果上与原方法相似。但是大大提高了训练的复杂度。
     在此基础上,我们又提出了决策树训练数据的选择方法。该方法大大降低的决策树训练的时间,同时也降低的决策树的规模。与ACS与SCS相比更小规模的决策树在并行学习的计算复杂度方面更有优势,同时也节省了磁盘存储空间。
     在本文中我们设计了大量的实验,包括小规模的双螺旋线实验和大规模的专利分类实验来验证上述观点。
Large-scale pattern recognition problems always restrict real applications of manymachine learning algorithms. These problems are usually very common, such as patentclassification. Even for efficient algorithms such as Support Vector Machine, large-scale problem are still too tough to learn. It is quite feasible to employ af?uent com-puting resources, and apply parallel computing environment ro solve these large-scaleproblems.
     Min-Max Modular Support Vector Machine(M3-SVM) is a”divide and conquer”based algorithm which can effectively solve large-scale problems. To reduce the timecomplexity in module combination step in M3-SVM, we proposed a Decision treeClassifier Selection(DCS) algorithms. DCS is an evolution of Symmetric ClassifierSelection(SCS). The results of experiments show that DCS can do classification asgood as SCS, and it can reduce the time complexity in prediction step.
     We also proposed a data selection method when training a decision tree. It canhighly reduce both the complexity of decision tree building and the decision tree size.Because of the smaller size, parallel computing can be applied better in DCS thanSCS. DCS can also save the memory space.
     We have done many experiments, such as two-spiral problem and patent classifi-cation, to prove our conclusion.

引文

[1] GRAF H, COSATTO E, BOTTOU L, et al. Parallel support vector machines:The cascade svm[J]. Advances in neural information processing systems, 2005,17(521-528):2.
    [2] IBM Parallel Machine Learning Toolbox[J].http://www.haifa.ibm.com/projects/verification/ml toolbox/index.html.
    [3] CAO L J, KEERTHI S S, ONG C J, et al. Parallel sequential minimal optimizationfor the training of support vector machines[J]. IEEE Transactions on NeuralNetworks, 2006, 17(4):1039–1049.
    [4] XU J, CROFT W. Corpus-based stemming using cooccurrence of word vari-ants[J]. ACM Transactions on Information Systems (TOIS), 1998, 16(1):61–81.
    [5] DUMAIS S, PLATT J, SAHAMI M, et al. Inductive Learning Algorithms andRepresentations for Text Categorization[C] .[S.l.]: ACM Press, 1998:148–155.
    [6] YANG Y, CHUTE C G. An example-based mapping method for text catego-rization and retrieval[J]. ACM Transactions on Information Systems, 1994,12(3):252–277.
    [7] YANG Y, PEDERSEN J. A comparative study on feature selection in text catego-rization[C] MACHINE LEARNING-INTERNATIONAL WORKSHOP THENCONFERENCE-. .[S.l.]: [s.n.] , 1997:412–420.
    [8] ROGATI M. High-performing feature selection for text classification[C] Pro-ceedings of the eleventh international conference on Information and knowledgemanagement. .[S.l.]: [s.n.] , 2002:659–661.
    [9] LEWIS D D. An evaluation of phrasal and clustered representations on a textcategorization task[C] SIGIR’92: Proceedings of the 15th annual internationalACM SIGIR conference on Research and development in information retrieval.New York, NY, USA: ACM, 1992:37–50.
    [10] SCH U¨TZE H, HULL D A, PEDERSEN J O. A comparison of classifiers and doc-ument representations for the routing problem[C] SIGIR’95: Proceedings of the18th annual international ACM SIGIR conference on Research and developmentin information retrieval. New York, NY, USA: ACM, 1995:229–237.
    [11] MCCALLUM A, NIGAM K. A comparison of event models for naive bayes textclassification[C] AAAI-98 workshop on learning for text categorization. .[S.l.]:[s.n.] , 1998,752.
    [12] HAN E, KARYPIS G, KUMAR V. Text categorization using weight adjusted k-nearest neighbor classification[C] Proceedings of the 5th Pacific-Asia conferenceon knowledge discovery and data mining. .[S.l.]: [s.n.] , 2001:53–65.
    [13] SCHAPIRE R E, SINGER Y. BoosTexter: a boosting-based system for text cate-gorization[C] Machine Learning. .[S.l.]: [s.n.] , 2000:135–168.
    [14] SCHAPIRE R E, SINGER Y, SINGHAL A. Boosting and Rocchio Applied to TextFiltering[C]//In Proceedings of ACM SIGIR.[S.l.]: ACM Press, 1998:215–223.
    [15] CORTES C, VAPNIK V. Support-vector networks[J]. Machine learning, 1995,20(3):273–297.
    [16] VAPNIK V. An overview of statistical learning theory[J]. IEEE transactions onneural networks, 1999, 10(5):988–999.
    [17] VOORHEES E, HARMAN D. TREC: Experiment and evaluation in informationretrieval[M].[S.l.]: MIT Press, 2005.
    [18] JOACHIMS T, NEDELLEC C, ROUVEIROL C. Text categorization with supportvector machines: learning with many relevant[C] Machine Learning: ECML-9810th European Conference on Machine Learning, Chemnitz, Germany. .[S.l.]:[s.n.] , 1998:137–142.
    [19] LIAN H, LU B, TAKIKAWA E, et al. Gender recognition using a min-maxmodular support vector machine[J]. Lecture notes in computer science, 2005,3611:438.
    [20] LU B, SHIN J, ICHIKAWA M. Massively parallel classification of single-trialEEG signals using a min-max modular neural network[J]. IEEE Transactions onbiomedical engineering, 2004, 51(3):551–558.
    [21] CHU X, MA C, LI J, et al. Large-scale patent classification with min-max modu-lar support vector machines[C] Proc. of International Joint Conference on NeuralNetworks (IJCNN), HongKong, China. .[S.l.]: [s.n.] , 2008.
    [22] LU B, ITO M. Task decomposition and module combination based on class rela-tions: A modular neural network for pattern classification[J]. IEEE Transactionson Neural Networks, 1999, 10(5):1244–1256.
    [23] LU B, ITO M. Decomposition and Parallel Learning of Imbalanced Classifica-tion Problems by Min-Max Modular Neural Network.[C]//5th Int Conf NeuralInf Proc. .[S.l.]: [s.n.] , 1998,1:199–202.
    [24] DUDA R, HART P, STORK D. Pattern Classification[M].[S.l.]: Wiley-Interscience, 2000.
    [25] EISEN M, SPELLMAN P, BROWN P, et al. Cluster analysis and display ofgenome-wide expression patterns[J]. Proceedings of the National Academy ofSciences, 1998, 95(25):14863.
    [26] WEN Y, LU B, ZHAO H. Equal Clustering Makes Min-Max Modular SupportVector Machine More Efficient[J]. Proc. 12th International Conference on NeuralInformation Processing, 2006:77–82.
    [27] CHU X L, MA C, LI J, et al. Large-scale patent classification with min-max modular support vector machines[C]//Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International JointConference on. .[S.l.]: [s.n.] , 2008:3973–3980, http://dx.doi.org/10.1109/IJCNN.2008.4634369.
    [28] LI J. Linear Discriminant Analysis[J].
    [29] JOACHIMS T. Making Large-Scale SVM Learning Practical[C]//. SCH?LKOPFB, BURGES C, SMOLA A. Advances in Kernel Methods - Support Vector Learn-ing.[S.l.]: MIT Press, Cambridge, MA, USA, 1999.
    [30] IWAYAMA M, FUJII A, KANDO. N. Overview of Classification Subtask atNTCIR-5 Patent Retrieval Task[C]. .[S.l.]: [s.n.] , 2005.
    [31] ZHAO H, LU B L. Improvement on Response Performance of Min-Max Mod-ular Classifier by Symmetric Module Selection[C] International Symposium onNeural Networks. .[S.l.]: [s.n.] , 2005:39–44.
    [32] YE Z, LU B, HUI C. Patent Classification Using Parallel Min-Max ModularSupport Vector Machine[C] Autonomous Systems -Self-Organization, Manage-ment, and Control. .[S.l.]: [s.n.] , 2008:156–167.
    [33] ZHAO H, LU B L. A Modular Reduction Method for k-NN Algorithm withSelf-recombination Learning[C] International Symposium on Neural Networks..[S.l.]: [s.n.] , 2006:537–544.
    [34] VILALTA R, DRISSI Y. A Perspective View and Survey of Meta-Learning.[J]. Artificial Intelligence Review, 2002, 18(2):77–95, 2003-12-04. http://dblp.uni-trier.de/db/journals/air/air18.html#VilaltaD02.
    [35] WOLPERT D. Stacked Generalization[J]. Nerual Networks, 1992, 5:241–259.
    [36] LU B, BAI H. An] Efficient Multilayer Quadratic Perceptron for Pattern Classi-fication and Function Approximation[C]//Proc. of International Joint Conferenceon Neural Networks. .[S.l.]: [s.n.] , 1993:1385–1388.
    [37] FALL C J, T O¨RCSV A′RI A, BENZINEB K, et al. Automated categorization in theinternational patent classification[J]. SIGIR Forum, 2003, 37(1):10–25.
    [38] LARKEY L S. A Patent Search and Classification System[C] Proceedings ofDL-99, 4th ACM Conference on Digital Libraries. .[S.l.]: [s.n.] , 1999:179–187.
    [39] MA C, LU B, UTIYAMA M. Incorporating Prior Knowledge into Task Decom-position for Large-Scale Patent Classification[C] Proceedings of the 6th Interna-tional Symposium on Neural Networks: Advances in Neural Networks-Part II..[S.l.]: [s.n.] , 2009:793.
    [40] RUGGIERI S. Efficient C4. 5[J]. IEEE transactions on knowledge and dataengineering, 2002:438–444.
    [41] YE Z. Parallel Min-Max Modular Support Vector Machine with Application toPatent Classification[J]. Master Thesis, 2009.
    [42] See5 and C5.0[J]. http://www.rulequest.com/see5-info.html.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700