On some transformations of high dimension, low sample size data for nearest neighbor classification
详细信息    查看全文
  • 作者:Subhajit Dutta ; Anil K. Ghosh
  • 关键词:Bayes risk ; HDLSS data ; High ; dimensional geometry ; Inter ; point distances ; Law of large numbers ; Misclassification probability ; Reproducing kernel Hilbert space ; \(\rho \) ; mixing sequences
  • 刊名:Machine Learning
  • 出版年:2016
  • 出版时间:January 2016
  • 年:2016
  • 卷:102
  • 期:1
  • 页码:57-83
  • 全文大小:846 KB
  • 参考文献:Aggarwal, C. C., Hinneburg, A., & Keim, D. A. (2001). On the surprising behavior of distance metrics in high-dimensional space. Database theory—ICDT, pp. 420–434.
    Aho, A. V., Hopcroft, J. D., & Ullman, J. E. (1974). Design and analysis of computer algorithms. London: Addison-Wesley.MATH
    Andrews, D. W. K. (1988). Laws of large numbers for dependent non-identically distributed random variables. Econometric Theory, 4, 458–467.MathSciNet CrossRef
    Arora, R., Gupta, M., Kapila, A., & Fazel, M. (2013). Similarity based clustering by left stochastic matrix factorization. Journal of Machine Learning Research, 14, 1715–1746.MATH MathSciNet
    Biswas, M., & Ghosh, A. K. (2014). A nonparametric two-sample test applicable to high-dimensional data. Journal of Multivariate Analysis, 123, 160–171.MATH MathSciNet CrossRef
    Biswas, M., Mukhopadhyay, M., & Ghosh, A. K. (2014). An exact distribution-free two-sample run test applicable to high dimensional data. Biometrika, 101, 913–926.MATH MathSciNet CrossRef
    Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.MATH CrossRef
    Cazzanti, L., Gupta, M., & Koppal, A. (2008). Generative models for similarity-based classification. Pattern Recognition, 41, 2289–2297.MATH CrossRef
    Chen, Y., Garcia, E. K., Gupta, M., Cazzanti, L., & Rahimi, A. (2009). Similarity-based classification: Concepts and algorithms. Journal of Machine Learning Research, 10, 747–776.MATH MathSciNet
    Chen, Y.-B., & Hall, P. (2009). Robust nearest neighbor methods for classifying high-dimensional data. Annals of Statistics, 37, 3186–3203.MathSciNet CrossRef
    Cost, S., & Salzberg, S. (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10, 57–78.
    Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions Information Theory, 13, 21–27.MATH CrossRef
    Deegalla, S., & Bostrom, H. (2006). Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In 5th International conference on machine learning and applications, pp. 245–250.
    de Jong, R. M. (1995). Laws of large numbers for dependent heterogeneous processes. Econometric Theory, 11, 347–358.MathSciNet CrossRef
    Ding, C., He, X., & Simon, H. D. (2005). On the equivalence of nonnegative matrix factorization and spectral clustering. Proceedings SIAM conference data mining, pp. 606–610.
    Duda, R., Hart, P., & Stork, D. G. (2000). Pattern Classification. New York: Wiley.
    Feller, W. (1968). An introduction to probability theory and its applications. New York: Wiley.MATH
    Fern, X. Z., & Brodley, C. E. (2003). Random projections for high-dimensional data clustering. Proceedings of 20th international conference machine learning, pp. 186–193.
    Fix, E., & Hodges, J. L, Jr. (1989). Discriminatory analysis: Nonparametric discrimination: Consistency properties. International Statistical Review, 57, 238–247.MATH CrossRef
    Francois, D., Wertz, V., & Verleysen, M. (2007). The concentration of fractional distances. IEEE Transaction on Knowledge and Data Engineering, 17, 873–886.CrossRef
    Genton, M. G. (2001). Classes of kernels for machine learning: A statistics perspective. Journal of Machine Learning Research, 2, 299–312.
    Ghosh, A. K., & Hall, P. (2008). On error-rate estimation in nonparametric classification. Statistica Sinica, 18, 1081–1100.MATH MathSciNet
    Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood component analysis. Advance Neural Information Processing Systems 18 (NIPS), 17, 513–520.
    Goldfarb, L. (1985). A new approach to pattern recognition. Program Pattern Recognition, 2, 241–402.MathSciNet
    Graepel, T., Herbrich, R., & Obermayer, K. (1999). Classification on pairwise proximity data. Advance Neural Information Processing System, 11, 438–444.
    Hall, P., Marron, J., & Neeman, A. (2005). Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society Series B, 67, 427–444.MATH MathSciNet CrossRef
    Hall, P., Park, B. U., & Samworth, R. (2008). Choice of neighbor order in nearest neighbor classification. Annals of Statistics, 36, 2135–2152.MATH MathSciNet CrossRef
    Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference and prediction. New York: Springer.CrossRef
    Hofmann, T., Schölkopf, B., & Smola, A. J. (2008). Kernel methods in machine learning. Annals of Statistics, 36, 1171–1220.MATH MathSciNet CrossRef
    Jung, S., & Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. Annals of Statistics, 37, 4104–4130.MATH MathSciNet CrossRef
    Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2, 18–22.
    Pekalska, E., Paclíc, P., & Duin, R. P. W. (2001). A generalized kernel approach to dissimilarity-based classification. Journal of Machince Learning Research, 2, 175–211.
    Radovanovic, M., Nanopoulos, A., & Ivanovic, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machince Learning Research, 11, 2487–2531.MATH MathSciNet
    Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of Statistical Software, 39, 1–13.CrossRef
    Tomasev, N., Radovanovic, M., Mladenic, D., & Ivanovic, M. (2011). Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Machine Learning Data Mining in Pattern Recogniton, LNCS, 6871, 16–30.CrossRef
    Vapnik, V. (1998). Statistical Learning Theory. New York: Wiley.MATH
    Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. Advances in Neural Information Processing Systems, 18, 1473–1480.
    Young, G., & Hoseholder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.MATH CrossRef
  • 作者单位:Subhajit Dutta (1)
    Anil K. Ghosh (2)

    1. Department of Mathematics and Statistics, Indian Institute of Technology, Kanpur, 208016, U.P., India
    2. Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, 203, B. T. Road, Kolkata, 700108, India
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Automation and Robotics
    Computing Methodologies
    Simulation and Modeling
    Language Translation and Linguistics
  • 出版者:Springer Netherlands
  • ISSN:1573-0565
文摘
For data with more variables than the sample size, phenomena like concentration of pairwise distances, violation of cluster assumptions and presence of hubness often have adverse effects on the performance of the classic nearest neighbor classifier. To cope with such problems, some dimension reduction techniques like those based on random linear projections and principal component directions have been proposed in the literature. In this article, we construct nonlinear transformations of the data based on inter-point distances, which also lead to reduction in data dimension. More importantly, for such high dimension low sample size data, they enhance separability among the competing classes in the transformed space. When the classic nearest neighbor classifier is used on the transformed data, it usually yields lower misclassification rates. Under appropriate regularity conditions, we derive asymptotic results on misclassification probabilities of nearest neighbor classifiers based on the \(l_2\) norm and the \(l_p\) norms (with \(p \in (0,1]\)) in the transformed space, when the training sample size remains fixed and the dimension of the data grows to infinity. Strength of the proposed transformations in the classification context is demonstrated by analyzing several simulated and benchmark data sets.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700