面向图像分类和识别的视觉特征表达与学习的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

About the library

Background
History
Leadership
Organization

Readers' Guide

Opening Hours
Collections
Help Via Email

Publications

Electronic Information Resources

面向图像分类和识别的视觉特征表达与学习的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Computer Vision Feature Representation and Learning for Image Classification and Recognition
作者：杨钊
论文级别：博士
学科专业名称：信息与通信工程
中文关键词：视觉特征 ; 特征表达 ; 特征学习 ; 深度学习
英文关键词：Computer vision features ; Feature representation ; Feature learning ; Deep learning
学位年度：2014
导师：金连文
学科代码：0810
学位授予单位：华南理工大学
论文提交日期：2014-04-08
答辩委员会主席：赖剑煌

摘要

视觉特征的提取是图像分类和识别中的一个关键环节，良好的特征设计能够减轻对后续机器学习算法的依赖性，特征的好坏直接制约着整个视觉系统的性能。因此，特征的研究一直是计算机视觉领域的一个重要研究方向。在长期的研究过程中，研究人员提出了各种特征提取方法用于解决具体的分类问题，这些特征包括基本的颜色特征、纹理特征、局部特征及全局特征等等，它们分别在各种图像分类和识别任务上取得了较好的应用，然而这些传统的特征提取方法存在两个问题：
     首先，随着视觉任务规模的增大以及复杂性的增强，如果直接用这些基本的特征进行分类任务，经常表现出不足。为此，研究人员提出了“特征表达”的方法，它是在最基本的特征基础上进行矢量量化、稀疏编码或其它表达方式以形成一幅图像最后的特征。最典型的特征表达方法是“词袋”（Bag of Words, BoW）模型，它是对图像的基本特征进行再次统计以形成最后的特征表示，基于该思想的特征表达方法在近几年（2006年~至今）得到了广泛的研究和应用，并在图像分类和识别上取得了非常好的性能。
     其次，针对某一视觉问题，通常情况下我们需要非常强的先验知识或者通过不同的特征尝试及参数选择才能得到令人满意的特征，给整个分类问题带来复杂性。因此，最近几年（2007年~至今）出现了“特征学习”的研究，它试图从原始的像素出发通过特定的神经网络结构自动发现图像中隐藏的模式以学习出有效特征。典型的方式有基于单层网络结构的特征学习和基于深度结构的特征学习，它们在图像分类和识别上均取得了成功的应用。
     针对以上情况，本文以提取有效的视觉特征为目的，着重研究面向图像分类和识别的视觉特征表达与学习。在分析目前特征提取方法的基础上，提出了新的特征表达与学习方法并用于解决具体的视觉问题，主要研究内容和创新工作包括以下几个方面：
     1.提出了基于局部约束编码的Kinect图像特征提取方法，即分别对RGB图像和深度特征提取dense SIFT特征并进行局部约束线性编码以形成Kinect图像对（image pairs）的特征表示，应用于场景分类和目标分类，在NUY Depth和B3DO数据集上验证了特征表达的有效性。
     2.对行人重识别（Person Re-identification）的特征提取方法进行对比研究，并提出HSV和Lab颜色统计直方图局部约束线性编码的方法用于提高行人重识别率。针对目前行人重识别中特征提取方法的复杂性，提出了目标中心编码（Object-Centric Coding, OCC）的外观模型，该方法是对行人图像进行SCA（Stel Component Analysis）分析以提取行人的轮廓区域，然后对该目标区域进行局部约束编码，有效地减少了杂乱背景的影响。我们在VIPeR行人重识别数据库上进行行人重识别实验，同时采用不同的距离学习方法来评价OCC的有效性。结果表明OCC能够极大地提升行人重识别的正确率，且在不同的距离学习下具有非常高且一致的识别率。
     3.分析对比了几种常见的单层网络特征学习方法，并提出了L2正则化的稀疏滤波（L2Regularized Sparse filtering）特征学习算法。该方法在保证特征学习的稀疏性同时对特征映射权值矩阵进行约束，以增强算法的泛化能力。我们在四种不同的特征学习数据库STL-10、CIFAR-10、Small Norb以及脱机手写汉字上进行对比实验，证明了该方法比原始的稀疏滤波具有更好的性能。
     4.研究了基于深度学习的特征学习方法，并针对传统两级手写汉字识别系统中相似手写汉字识别（SHCCR）受特征提取方法的限制，提出了采用卷积神经网（Convolutional Neural Networks, CNN）对相似汉字自动学习有效特征并进行识别，并采用来自手写云平台上的大数据来训练模型以进一步提高识别率。实验表明，相对于传统的基于梯度特征的支持向量机（SVM）和最近邻分类器（1-NN）方法，识别率有较大的提高。
     通过上述的研究工作，结果表明：有效的特征表达方法能够极大地改善视觉图像分类和识别的性能；基于单层网络和深度结构的特征学习能够对原始的图像数据学习出有效的特征，避免了人工设计特征的复杂性，是一个非常前沿的研究方向且具有广泛的应用前景。
Vision feature extraction is critical for image classification and recognition. Featureswith good performance could reduce the dependence on complex machine learning algorithmsto get satisfactory results, and they directly influence the performance of a whole visionsystem. Therefore, feature extraction is an important research direction in the field ofcomputer vision. During the research process, several feature extraction methods such ascolor feature, texture feature, local feature and global feature, have been proposed to solvespecific problems, and often could obtain good results. However, there are some problems forthe traditional feature extraction methods.
     First, as the increasing complexity of the vision tasks, the basic features could not getsatisfactory results when they are used for classification tasks directly. Therefore, featurepresentation methods are proposed to improve the performance of the features. Featurerepresentation refers to conduct vector quantization，sparse coding or other methods to get afinal feature representation of an input image. The most typical method is “Bog of Words”(BoW) model, which performs statistical analysis on the basic features according a dictionary.The methods based on BoW have been extensively studied and used in recent years (since2006), and get better results on image classification and recognition.
     Second, for a certain vision task, generally it requires much prior knowledge or complexparameter selections to get a satisfactory result, increasing the difficulty of the classificationproblems. To address this problem,“feature learning” method is proposed in recent years(since2007), and it works by learning features automatically from the raw pixels through aneural network structure. In general, there are two types of networks could be used for thispurpose, including single-layer neural networks and deep neural networks, and both of themget successful applications in image classification and recognition.
     Considering above situations, this dissertation dedicates to the research of featurerepresentation and learning based on image classification and recognition for the purpose ofget effective vision features. By analyzing the current feature presentation and learningmethods, we proposed new approaches and applied to solve the specific vision tasks. Themain work and innovations of this dissertation are as follow:
     1. In this dissertation, we propose the feature extraction method for Kinect imagesbased on locality-constrained linear coding (LLC). Specifically, we extract denseSIFT features from the RGB image and depth image of a Kinect image pairs andconduct feature coding respectively. The features are used for Kinect sceneclassification and object classification, and the experiments on NUY Depth andB3DO datasets to show the features performance.
     2. We investigate a comparative study of several feature extraction methods for personre-identification, and propose a new feature extraction method by integrating theLLC and HSV、Lab color histogram. Additionally, due to the complexity of featureextraction methods in recent literatures, we propose a new appearance model calledObject-Centric Coding (OCC) for person re-identification. Under the OCCframework, the silhouette of a pedestrian is firstly extracted via Stel ComponentAnalysis (SCA), and then dense SIFT features are extracted followed by LLCcoding. By this, the coding descriptor could focus on the genuine body eliminatingthe influence of the background. Results from the comparative experiments alongwith the existing approaches show the OCC model significantly improves theperson re-identification rates, while several metric learning methods are used toevaluate its effectiveness.
     3. We analyze several feature learning methods based on single-layer networks, andproposed L2regularized sparse filtering for feature learning. This method couldguarantee the sparsity distribution of the learned features and gain bettergeneralization ability in the meantime. Classification experiments on four differentdatasets: STL-10, CIFAR-10, Small Norb and subsets of CASIA-HWDB1.0handwritten characters, show that our method has improved performance over thestandard sparse filtering.
     4. We also investigate the feature learning methods based on deep learning. In viewthat the recognition rates of the Similar Handwritten Chinese Character Recognition(SHCCR) in traditional two-level classification systems are not very high due to therestriction of the feature extraction methods, a new method based on ConvolutionalNeural Networks (CNN) is proposed to learn effective features automatically and conduct recognition. In addition, we use big data from a handwritten cloud platformto train the network to further improve the accuracy. The final experimental resultsshow that our proposed method achieves better performance comparing withSupport Vector Machine (SVM) and Nearest Neighbor Classifier (1-NN) based ongradient feature.
     In conclusion, through above work, it turns out that effective feature representationmethods could improve the performance of image classification and recognition greatly andthe feature learning methods based on single-layer networks and deep architectures couldlearn feature from the raw image data avoiding the complexity of the design of thehand-crafted features. And the feature learning is a very frontier research direction and haswide application prospects.

引文

[1] T. Ahonen, A. Hadid, M. Pietikainen,“Face description with local binary patterns:application to face recognition”[J], IEEE Transaction on Pattern Analysis and MachineIntelligence,28(12):971-987,2002.
    [2] I. Arel, D. C. Rose, T. P. Karnowski,“Deep machine learning–a new frontier inartificial intelligence research”[J], IEEE Computational Intelligence Magazine,2010.
    [3] Z. Bai, Q. Huo,“A study on the use of8-directional features for online handwrittenChinese character recognition”[C], in International Conference on Document Analysisand Recognition,2005.
    [4] Baidu opens a deep learning lab in the Silicon Valley,http://deeplearning.net/2013/04/13/baidu-opens-a-deep-learning-lab-in-the-silicon-valley/.
    [5] S. Bak, E. Corvee, F. Brémond, et al,“Person re-identification using spatial covarianceregions of human body parts”[C], in IEEE International Conference on AdvancedVideo and Signal Based Surveillance,2010.
    [6] M. Belkin, P. Niyogi,“Laplacian eigenmaps and spectral techniques for embedding andclustering”[C], in Neural Information Processing Systems,2001.
    [7] Y. Bengio, http://www.iro.umontreal.ca/~bengioy/yoshua_en/index.html.
    [8] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle,“Greedy layer-wise training of deepnetworks”, in Neural Information Processing Systems,2007.
    [9] Y. Bengio,“Learning deep architectures for AI”[J], Foundations and trends in MachineLearning,2(1):1-127,2009.
    [10] Y. Bengio, A. Courville,“Deep Learning of Representations”[M], Handbook on NeuralInformation Processing, Springer Berlin Heidelberg, pp.1-28,2013.
    [11] Y. Bengio, A. Courville, P. Vincent,“Representation learning: a review and newperspectives”[J], IEEE Transactions on Pattern Analysis and Machine Intelligence,35(8):1798-1828,2013.
    [12] L. Bo, X. Ren, D. Fox,“Depth kernel descriptors for object recognition”[C], in IEEEInternational Conference on Intelligent Robots and Systems,2011.
    [13] L. Bo, L. Lai, X. Ren, et al,“Object recognition with hierarchical kernel descriptors”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2011.
    [14] A. Bosch, A. Zisserman, X Munoz,“Representing shape with a spatial pyramid kernel”[C], in ACM International Conference on Image and video retrieval,2007.
    [15] J. Bouvrie,“Notes on convolutional neural networks”[R], MIT CBCL Tech Report,2006.
    [16] C. C. Chang, C. J. Lin,“LIBSVM: a library for support vector machines”,http://www.csie.ntu.edu.tw/~cjlin/libsvm/,2001.
    [17] D. Cai, H. Bao, X. He,“Sparse concept coding for visual analysis”[C], in IEEEConference on Computer Vision and Pattern Recognition,2011.
    [18] Y. Cai, M. Pietikainen,“Person re-identification based on global color context”[C], inACCV International Workshop on Computer Vision,2010.
    [19] K. Chatfield, V. Lenmtexpisky, A. Vedaldi, A. Zisserman,“The devil is in the details: anevaluation of recent feature encoding methods”[C], in British Machine VisionConference,2011.
    [20] C.-K. Chiang, C.-H. Duan, S.-H. Lai,“Learning component-level sparse representationusing histogram information for image classification”[C], in IEEE InternationalConference on Computer Vision,2011.
    [21] D. Ciresan, U. Meier, L. Gambardella, et al,“Convolutional neural network committeesfor handwritten character classification”[C], in International Conference on DocumentAnalysis and Recognition,2011.
    [22] D. Ciresan, U. Meier, J. Schmidhuber,“Multi-column deep neural networks for imageclassification”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2012.
    [23] A. Coates, B. Carpenter, C. Case, et al.“Text detection and character recognition inscene images with unsupervised feature learning”[C], in International Conference onDocument Analysis and Recognition,2011.
    [24] A. Coates, H. Lee, A. Y. Ng,“An analysis of single-layer networks in unsupervisedfeature learning”[C], in International Conference on Artificial Intelligence andStatistics,2011.
    [25] A. Coates, A. Y. Ng,“Selecting receptive fields in deep networks”[C], in NeuralInformation Processing Systems,2011.
    [26] A. Coates, A. Karpathy, A. Y. Ng,“Emergence of object-selective features inunsupervised feature learning,” in Neural Information Processing Systems,2012.
    [27] A. Coates, A. Y. Ng,“Learning feature representation with K-means”[J], NeuralNetworks: Tricks of the Trade,7700:561-580,2012.
    [28] A. Coates,“Demystifying unsupervised feature learning”[D], PhD. thesis, StanfordUniversity,2012.
    [29] T. Cover,“Estimation by the nearest neighbor rule”[J], IEEE Transaction onInformation Theory,14(1):50-55,1968.
    [30] J. L. Crowley, A. C. Parker,“A representation for shape based on peaks and ridges inthe difference of low-pass transform”[J], IEEE Transaction on Pattern Analysis andMachine Intelligence,6:156-170,1984.
    [31] N. Dalal, B. Triggs,“Histograms of oriented gradients for human detection”[C], inIEEE Conference on Computer Vision and Pattern Recognition,2005.
    [32] J. Davis, B. Kulis, P. Jain, et al,“Information-Theoretic Metric Learning”[C], inInternational Conference on Machine learning,2007.
    [33] L. Deng, G. Hinton, B. Kingsbury,“New types of deep neural network learning forspeech recognition and related applications: An overview”[C], in IEEE InternationalConference on Acoustics, Speech, and Signal Processing,2013.
    [34] H. Deng, G. Stathopoulos, C. Suen,“Error-correcting output coding for theconvolutional neural network for optical character recognition”[C], in InternationalConference on Document Analysis and Recognition,2009.
    [35] I. S. Dhillon, D. M. Modha,“Concept decompositions for large sparse text data usingclustering”[J], Machine Learning,42(1):143-175,2011.
    [36] J. Donahue, Y. Jia, O. Vinyals, et al,“DeCAF: a deep convolutional activation featurefor generic visual recognition”[C], in Neural Information Processing Systems,2013.
    [37] D. Erhan, Y. Bengio, A. Courville, et al,“Visualizing higher-layer features of a deepnetwork”[R], Technical Report, University of Montreal,2009.
    [38] A. Ess, B. Leibe, L. Van Gool,“Depth and Appearance for Mobile Scene Analysis”[C],in IEEE International Conference on Computer Vision,2007.
    [39] M. Everingham, L. Van Gool, C. K. I. Williams, et al,“The PASCAL visual objectclasses challenge2007(VOC2007) Results”,http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/,2007.
    [40] R. Fan, K. Chang, C. Hsieh, et al,“LIBLINEAR: A library for large linear classification”[J], Journal of Machine Learning Research,9:1871-1874,2008.
    [41] M. Farenzena, L. Bazzani, A. Perina, V. Murino, et al.“Person Re-identification bySymmetry-Driven Accumulation of Local Features”[C], in IEEE Conference onComputer Vision and Pattern Recognition,2010.
    [42] L. Fei-Fei, R. Fergus, P. Perona,“Learning generative visual models from few trainingexamples: an incremental Bayesian approach test on101object categories”[C], inIEEE CVPR Workshop on Generative-Model Based Vision,2004.
    [43] L. Fei-Fei, P. Perona,“A Bayesian hierarchical model for learning natural scenecategories”, in IEEE Conference on Computer and Pattern Recognition,2005.
    [44] R. Fisher,“The use of multiple measurements in taxonomic problems”[J], Annals ofEugenics,7(7):179-188,1936.
    [45] A. Ford, A. Roberts,“Colour Space Conversions”[J], Westminster University,1-31,1998.
    [46] E. Forgy,“Cluster analysis of multivariate data: efficiency vs. interpretability ofclassification”[J], Biometric,21:768,1965.
    [47] Google s new deep learning algorithm transcribes house numbers,http://deeplearning.net/2014/01/07/googles-new-deep-learning-algorithm-transribes-house-numbers/.
    [48] T. Gao, C. Liu,“High accuracy handwritten Chinese character recognition usingLDA-based compound distances”[J], Pattern Recognition,41(11):3442-3451,2008.
    [49] Y. Gao, L. Jin, C. He, et al,“Handwriting character recognition as a service: A newhandwriting recognition system based on cloud computing”[C], in InternationalConference on Document Analysis and Recognition,2011.
    [50] N. Gheissari, T. Sebastian, and R. Hartley,“Person reidentification using spatiotemporalappearance”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2006.
    [51] A. Globerson and S. Roweis,“Metric learning by collapsing classes”[C], in NeuralInformation Processing Systems,2005.
    [52] K. Grauman, T. Darrell,“Pyramid match kernels: Discriminative classification with setsof image features”[C], in IEEE International Conference on Computer Vision,2005.
    [53] D. Gray and H. Tao,“Viewpoint invariant pedestrian recognition with an ensemble oflocalized features”[C], in European Conference on Computer Vision,2008.
    [54] D. Gray, S. Brennan, H. Tao,“Evaluating Appearance Models for Recognition,Reacquisition, and Tracking”[C], in IEEE International Workshop on PerformanceEvaluation of Tracking and Surveillance,2007.
    [55] G. Griffin, A. Holub, P. Perona,“Caltech-256object category dataset”, CaliforniaInstitute of Technology,2007.
    [56] M. Guillaumin, J. Verbeek, C. Schmid,“Is that you? Metric learning approaches forface identification”[C], in IEEE International Conference on Computer Vision,2009.
    [57] C. Harris, M. Stephens,“A combined corner and edge detector”[C], Alvey VisionConference,1988.
    [58] S. Hawe, M. Seibert, M. Kleinsteuber,“Separable dictionary learning”[C], in IEEEConference on Computer Vision and Pattern Recognition,2013.
    [59] G. E. Hinton, https://www.cs.toronto.edu/~hinton/.
    [60] G. E. Hinton, R. Salakhutdinov,“Reducing the dimensionality of data with neuralnetworks”[J], Science,313(5786):504-507,2006.
    [61] G. E. Hinton, S. Osindero, Y. W. The,“A fast learning algorithm for deep belief nets”[J], Neural computation,18(7):1527–1554,2006.
    [62] G. E. Hinton,“Learning multiple layers of representation”[J], Trends in CongnitiveScience,11(10):428-434,2007.
    [63] G. E. Hinton,“Improving neural networks by preventing co-adaptation of featuredetectors”[J], arXiv:1207.0580,2012.
    [64] C. Hsu, C. Chang, and C. Lin.“A practical guide to support vector classification”,Technical report, Department of Computer Science and Information Engineering,National Taiwan University, http://www.csie.ntu.edu.tw/~cjlin/libsvm/,2003.
    [65] X. Hu, P. Qi, B. Zhang,“Hierarchical K-means algorithm for modeling visual area V2neurons”[C], in International Conference on Neural Information Processing,2012.
    [66] Q. Huang, Z. Yang, W. Hu, L. Jin,“Linear Tracking for3-D Medical UltrasoundImaging”[J], IEEE transactions on Cybernetics,43(6):1747-1754,2013.
    [67] J. Huang,“Color-spatial image indexing and application”[D], PhD. thesis, CornellUniversity,1998.
    [68] A. Hyv rinen, E. Oja,“Independent component analysis: algorithms and applications”[J], Neural Networks,13(4-5):411-430,2000.
    [69] A. Janoch, S. Karayev, Y. Jia, et al,“A category-level3-D object dataset: Putting theKinect to work”[C], in ICCV Workshop on Consumer Depth Cameras for ComputerVision,2011.
    [70] K. Jarrett, K. Kavukcuoglu, M. Ranzato, et al,“What is the best multi-stage architecturefor object recognition”[C], in IEEE International Conference on Computer Vision,2009.
    [71] Yi, Ji, K. Idrissi, A. Baskurt,“Object categorization using boosting within hierarchicalbayesian model”[C], in IEEE International Conference on Image Processing,2009.
    [72] Y. Jia, C. Huang, T. Darrell,“Beyond spatial pyramids: receptive field learning forpooled image features”[C], in IEEE Conference on Computer Vision and PatternRecognition,2012.
    [73] L. Jin, G. Wei,“Handwritten Chinese character recognition with directionaldecomposition cellular features”[J], Journal of Circuit, System and Computer,8(4):517-520,1998.
    [74] R. S. John, S. F. Chang,“Transform features for texture classification anddiscrimination in large image database”[C], in IEEE International Conference onImage Processing,1994.
    [75] A. E. Johnson, M. Hebert,“Using spin images for efficient object recognition incluttered3D scenes”[J], IEEE Transactions on Pattern Analysis and MachineIntelligence,1999.
    [76] N. Jojic, A. Perina, M. Cristani, V. Murino, et al,“Stel component analysis: Modelingspatial correlations in image class structure”[C], in IEEE Conference on ComputerVision and Pattern Recognition,2009.
    [77] I.T. Jolliffe,“Principal component analysis, second edition”[M], Springer-Verlag, NewYork,2002.
    [78] N. Jones,“Computer science: the learning machines”[J], Nature,505:146-148,2014.
    [79] K. Kavukcuoglu, M. Ranzato, R. Fergus,“Learning invariant features throughtopographic filter maps”[C], in IEEE Conference on Computer Vision and PatternRecognition,2009.
    [80] K. Kavukcuoglu, P. Sermanet, Y. Boureau,“Learning convolutional feature hierarchiesfor visual recognition”[C], in Neural Information Processing Systems,2010.
    [81] Microsoft Kinect, http://www.xbox.com/en-us/kinect.
    [82] Microsoft s Richard Rashid demos deep learning for speech recognition in China,http://deeplearning.net/2012/12/13/microsofts-richard-rashid-demos-deep-learning-for-speech-recognition-in-china/.
    [83] Y. Ke, R. Sukthankar,“PCA-sift: A more distinctive representation for local imagedescriptors”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2004.
    [84] R. Kiros, C. Szepesvari,“On linear embeddings and unsupervised feature learning”[C],in International Conference on Machine Learning,2012.
    [85] T. Kobayashi,“BoF meets HOG: feature extraction based on histograms of orientedp.d.f gradients for image classification”[C], in IEEE Conference on Computer Visionand Pattern Recognition,2013.
    [86] L, Kong, Q. Huang, M. Lu, S. Zheng, L. Jin, S. Chen,“Accurate image registrationusing SIFT for extended-filed of view sonography”[C], in International Conference onBioinformatics and Biomedical Engineering,2010.
    [87] M. Kostinger, M. Hirzer, P. Wohlhart, et al,“Large scale metric learning fromequivalence constraints”[C], in IEEE Conference on Computer Vision and PatternRecognition,2012.
    [88] A. Krizhevsky, G. Hinton,“Learning multiple layers of features from tiny images”[D],Master's thesis, Department of Computer Science, University of Toronto,2009.
    [89] A. Krizhevsky, I. Sutskever, G. E. Hinton,“ImageNet classification with deepconvolutional neural networks”, in Neural Information Processing Systems,2012.
    [90] K. Lai, L. Bo, X. Ren, et al,“Alarge-scale hierarchical multi-view rgb-d object dataset”[C], in IEEE International Conference on Robotics and Automation,2011.
    [91] S. Lazebnik, C. Schmid, and J. Ponce,“Beyond bags of features: Spatial pyramidmatching for recognizing natural scene categories”[C], in IEEE Conference onComputer Vision and Pattern Recognition,2006.
    [92] Q. V. Le, J. Ngiam, Z. Chen,“Tiled convolutional neural networks”[C], in NeuralInformation Processing Systems,2010.
    [93] Q. V. Le, J. Ngiam, A. Coates, et al,“On optimization methods for deep learning”[C],in International Conference on Machine Learning,2011.
    [94] Q. V. Le, M. Ranzato, R. Monga, et al,“Building high-level features using large scaleunsupervised learning”[C], in International Conference on Machine Learning,2012.
    [95] Yann LeCun s facebook post and his decision,http://deeplearning.net/2013/12/10/facebook-hires-yann-lecun/.
    [96] Y. LeCun, http://yann.lecun.com/.
    [97] Y. LeCun, C. Cortes, C. J. C. Burges,“THE MNIST DATABASE”,http://yann.lecun.com/exdb/mnist/.
    [98] Y. LeCun, B. Boser, J. S. Denker, et al,“Handwritten digit recognition with aback-propagation network”[C], in Neural Information Processing Systems,1990.
    [99] Y. LeCun, F. J. Huang, L. Bottou,“Learning methods for generic object recognitionwith invariance to pose and lighting”[C], in IEEE Conference on Computer Vision andPattern Recognition,2004.
    [100] Y. LeCun, K. Kavukcuoglu, C. Farabet,“Convolutional networks and applications invision”[C], in IEEE International Symposium on Circuits and Systems,2010.
    [101] Y. LeCun,“Learning invariant feature hierarchies”[C], in European Conference onComputer Vision,2012.
    [102] T. S. Lee,“Image representation using2D Gabor wavelets”[J], IEEE Transactions onPattern Analysis and Machine Intelligence,18(10):959-971,1996.
    [103] H. Lee, A. Battle, R. Raina, et al,“Efficient sparse coding algorithms”[C], in NeuralInformation Processing Systems,2007.
    [104] H. Lee, C. Ekanadham, and A. Y. Ng,“Sparse deep belief net model for visual area V2”[C], in Neural Information Processing Systems,2008.
    [105] H. Lee, R. Grosse, R. Ranganath, et al,“Convolutional deep belief networks forscalable unsupervised learning of hierarchical representation”[C], in InternationalConference on Machine Learning,2009.
    [106] J. Lee, R. Jin, and A. Jain.“Rank-based distance metric learning: An application toimage retrieval”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2008.
    [107] I. Lenz, H. Lee, A. Sacena,“Deep learning for detecting robotic grasps”[C], inInternational Conference on Learning Representations,2013.
    [108] K. Leung, C. Leung,“Recognition of handwritten Chinese characters by critical regionanalysis”[J]. Pattern Recognition,43(3):949-961,2010.
    [109] J. Liu, J. Luo, M. Shah,“Recognizing realistic actions from videos in the wild”[C], inIEEE Conference on Computer Vision and Pattern Recognition,2009.
    [110] C.-L. Liu,“Normalization-cooperated gradient feature extraction for handwrittencharacter recognition”[J], IEEE Transaction on Pattern Analysis and MachineIntelligence,29(8):1465-1469,2007.
    [111] C.-L. Liu, F. Yin, D.-H. Wang, et al,“CASIA online and offline Chinese handwritingdatabases”[C], in International Conference on Document Analysis and Recognition,2011.
    [112] C.-L. Liu, F. Yin, Q.-F. Wang, et al,“ICDAR2011Chinese handwriting recognitioncompetition”[C], in International Conference on Document Analysis and Recognition,2011.
    [113] C.-L. Liu, F. Yin, D.-H. Wang, et al,“Online and offline handwritten Chinese characterrecognition: benchmarking on new databases”[J], Pattern Recognition,46(1):155-162,2013.
    [114] Z. Liu, L. Jin,“A static candidates generation technique and its application in two-stageLDAChinese character recognition”[C], in Chinese Control Conference,2007.
    [115] D. Lowe,“Object recognition from local scale-invariant features”[C], in IEEEInternational Conference on Computer Vision,1999.
    [116] D. Lowe,“Distinctive image features from scale-invariant keypoints”[J], InternationalJournal of Computer Vision,60(2):91-110,2004.
    [117] W. Y. Ma, B. S. Manjunath,“A comparison of wavelet transform features for textureimage annotation”[C], in IEEE International Conference on Image Processing,1995.
    [118] C. Madden, E. Dahai Cheng, and M. Piccardi,“Tracking people across disjoint cameraviews by an illumination-tolerant appearance representation”[J], Machine Vision andApplications,18(3):233–247,2007.
    [119] B. Mcfee, G. Lanckriet,“Metric learning to rank”[C], in International Conference onMachine Learning,2010.
    [120] K. Mikolajczyk, C. Schmid,“A performance evaluation of local descriptors”[J], IEEETransactions on Pattern Analysis and Machine Intelligence,27(10):31–47,2005.
    [121] Mit Technology Review,10breakthrough technologies2013,http://www.technologyreview.com/lists/breakthrough-technologies/2013/.
    [122] H. Moravec,“Towards automatic visual obstacle avoidance”[C], in InternationalConference on Artificial Intelligence,1977.
    [123] N. Morioka, S. Staoh,“Compact correlation coding for visual object categorization”[C], in IEEE International Conference on Computer Vision,2011.
    [124] Y. Netzer, T. Wang, A. Coates, et al,“Reading digits in natural images withunsupervised feature learning”[C], in NIPS Workshop on Deep Learning andUnsupervised Feature,2011.
    [125] A. Y. Ng, http://cs.stanford.edu/people/ang/.
    [126] J. Ngiam, P. W. Koh, Z. Chen, et al,“Sparse filtering”[C], in Neural InformationProcessing Systems,2012.
    [127] A. Oliva, A. Torralba,“Modeling the shape of the scene: A holistic representation of thespatial envelope”[J], International Journal of Computer Vision,42(3):145-175,2001.
    [128] T. Ojala, M. Pietikainen, D. Harwood,“Performance evaluation of texture measureswith classification based on Kullback discrimination of distributions”[C], inInternational Conference on Pattern Recognition,1994.
    [129] T. Ojala, M. Pietikainen, D. Harwood,“A comparative study of texture measures withclassification based on feature distributions”[J], Pattern Recognition,29(1):51-59,1996.
    [130] T. Ojala, M. Pietikainen, T. Maenpaa,“Multiresolution Gray-Scale and RotationInvariant Texture Classification with Local Binary Patterns”[J], IEEE Transactions onPattern Analysis and Machine Intelligence,24(7),971-987,2002.
    [131] N. Papadakis, E. Provenzi, V. Caselles,“A variational model for histogram transfer ofcolor images”[J], IEEE Transactions on Image Processing,20(6):1682-1695,2011.
    [132] G. Pass, R. Zabin, J. Miller,“Comparing images using color coherence vectors”, inACM International Conference on Multimedia,1996.
    [133] U. Porwal, Y. Zhou, V. Govindaraju,“Handwritten Arabic text recognition using deepbelief networks”[C], in International Conference on Pattern Recognition,2012.
    [134] B. Prosser, S. Gong, and T. Xiang,“Multi-camera matching under illumination changeover time”[C], in ECCV Workshop on Multi-camera and Multi-modal Sensor FusionAlgorithms and Applications, Marseille,2008.
    [135] B. Prosser, W.-S. Zheng, S. Gong et al,“Person re-identification by support vectorranking”[C], in British Machine Vision Conference,2010.
    [136] T. J. Randen, J. H. Husoy,“Filtering for texture classification: A comparative study”[J],IEEE transactions on Pattern Analysis and Machine Intelligence,21(4):291-310,1999.
    [137] M. Ranzato, C. Poultney, S. Chopra, et al,“Efficient learning of sparse representationswith an energy-based model”[C], in Neural Information Processing Systems,2006.
    [138] A. Rao, R. Srihari, Z. Zhang,“Spatial color histogram for content-based retrieval”[C],in IEEE International Conference on Tools with AI,1999.
    [139] S. Roweis, L. Saul,“Nonlinear dimensionality reduction by locally linear embedding”[J], Science,290(5500):2323-2326,2000.
    [140] O. Russakovsky, Y. Lin, K. Yu, et al,“Object-centric spatial pooling for imageclassification”[C], in European Conference on Computer Vision,2012.
    [141] K. Sande, T. Gevers, C. G. M. Snoek,“Evaluating color descriptors for object and scenerecognition”[J], IEEE Transactions on Pattern Analysis and Machine Intelligence,32(9):1582-1596,2010.
    [142] A. M. Saxe, P. W. Koh, Z. Chen, et al,“On random weights and unsupervised featurelearning”[C], in International Conference on Machine Learning,2011.
    [143] W. R. Schwartz, L. S. Davis,“Learning Discriminative Appearance-based ModelsUsing Partial Least Squares”[C], in Brazilian Symposium on Computer Graphics andImage Processing,2009.
    [144] M. Schultz, T. Joachims,“Learning a distance metric from relative comparisons”[C], inNeural Information Processing Systems,2004.
    [145] M. Schmidt, minFunc, http://www.di.ens.fr/~mschmidt/Software/minFunc.html.
    [146] P. Sermanet, S. Chintala, Y. LeCun,“Convolutional neural networks applied to housenumbers digit classification”[C], in International Conference on PatternRecognition[C],2012.
    [147] P. Sermanet, K. Kavukcuoglu, S. Chintala,“Pedestrian detection with unsupervisedmulti-stage feature learning”[C], in IEEE Conference on Computer Vision and PatternRecognition,2013.
    [148] A. Shabou, H. Borgne,“Locality-constrained and spatially regularized coding for scenecategorization”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2012.
    [149] J. Shlens,“A tutorial on principal component analysis”[M], Systems NeurobiologyLaboratory, University of California at San Diego,2005.
    [150] N. Silberman, R. Fergus,“Indoor scene segmentation using a structured light sensor”[C], in ICCV Workshop on3D Representation and Recognition,2011.
    [151] P. Y. Simard, D. Steinkraus, J. Platt,“Best Practices for Convolutional Neural NetworksApplied to Visual Document Analysis”[C], in International Conference on DocumentAnalysis and Recognition,2003.
    [152] S. Smith, J. Brady,“SUSAN-A new approach to low level image processing”[J],International Journal of Computer Vision,23(1):45-78,1997.
    [153] D. M. Squire, W. Muller, T. Pun,“Conten-based query of image databases: inspirationsfrom text retrieval”[J], Pattern Recognition Letters,21(13-14):1193-1198,2000.
    [154] M. Stricker, M. Orengo,“Similarity of color images”[C], in SPIE Storage andRetrieval for Image and Video Database,1995.
    [155] M. Swain, D. Ballard,“Color indexing”[J], International Journal of Computer Vision,7(1):11-32,1991.
    [156] A. Szlam, K. Gregor, Y. LeCun,“Fast approximations to structured sparse coding andapplications to object classification”[C], in European Conference on Computer Vision,2012.
    [157] Y. Tang, A. Mohamed,“Multiresolution deep belief networks”[C], in InternationalConference on Artificial Intelligence and Statistics,2012.
    [158] D. Tao, L. Liang, L. Jin, et al,“Similar handwritten Chinese character recognition bykernel discriminative locality alignment”[J], Pattern Recognition Letters,35(1):186-194,2014.
    [159] D. Tao, L. Jin, Z. Yang, et al,“Rank Preserving Sparse Learning for Kinect based SceneClassification”[C], IEEE Transactions on Cybernetics,43(5):1406-1417,2013.
    [160] D. Tao, L. Jin, S. Zhang, Z. Yang, et al,“Sparse Discriminative InformationPreservation for Chinese Character Font Categorization”[C], Neurocomputing,2013.
    [161] J. Tenenbaum, V. Silva, J. Langford,“A global geometric framework for nonlineardimensionality reduction”[J], Science,290(5500):2319-2323,2000.
    [162] V. Vapnik,“The nature of statistical learning theory”[M], Springer, New York,2000.
    [163] L. Vaquero, J. Caceres, D. Moran,“The Challenge of Service Level Scalability for theCloud”[J], International Journal of Cloud Applications and Computing,1(1):34-44,2011.
    [164] A. Vedaldi, A. Zesserman,“Sparse Kernel Approximations for Efficient Classificationand Detection”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2012.
    [165] P. Vincent, H. Larochelle, Y. Bengio, et al,“Extracting and composing robust featureswith denoising autoencoders”[C], in International Conference on Machine Learning,2008.
    [166] L. Wan, M. Zeiler, S. Zhang, et al,“Regularization of neural networks usingdropconnect”[C], in International Conference on Machine Learning,2013.
    [167] J. Wang, J. Yang, K. Yu, et al,“Locality-constrained linear coding for imageclassification”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2010.
    [168] X. Wang, G. Doretto, T. Sebastian, J. Rittscher, et al,“Shape and appearance contextmodeling”[C], in IEEE International Conference on Computer Vision,2007.
    [169] J. Wang, J. Yang, K. Yu, F,“Locality-constrained linear coding for image classification”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2010.
    [170] T. Wang, D. Wu, A. Coates, et al,“End-to-end text recognition with convolutionalneural networks”[C], in International Conference on Pattern Recognition,2012.
    [171] K. Weinberger, J. Blitzer, and L. Saul,“Distance metric learning for large marginnearest neighbor classification”[C], in Neural Information Processing Systems,2006.
    [172] J. Xiao, J. Hays, K. Ehinger, et al.,“Sun Database: Large-scale scene recognition fromabbey to zoo”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2010.
    [173] S. Xiang, F. Nie, and C. Zhang,“Learning a mahalanobis distance metric for dataclustering and classification”[J], Pattern Recognition,41(12):3600–3612,2008.
    [174] E. Xing, A. Ng, M. Jordan,“Distance metric learning, with application to clusteringwith side-information”[C], in Neural Information Processing Systems,2002.
    [175] S. Yan, X. Xu, D. Xu, et al,“Beyond spatial pyramids: a new feature extractionframework with dense spatial sampling for image classification”[C], in EuropeanConference on Computer Vision,2012.
    [176] Z. Yang, L. Jin, D. Tao,“Kinect Image Classification using LLC”[C], in ACMInternational Conference on Internet Multimedia Computing and Service,2012.
    [177] Z. Yang, L. Jin, D. Tao,“A Comparative Study of several Feature extraction methodsfor Person re-identification”[C], in Chinese Conference on Biometric Recognition,2012.
    [178] Z. Yang, D. Tao, S. Zhang, L. Jin,“Similar Handwritten Chinese Character Recognitionbased on Convolutional neural networks with Big Data”[J], Journal on Communication,2014.
    [179] Z. Yang, L. Jin, D. Tao,“Person re-identificaiton by object-centric model”[J],unpublished.
    [180] Z. Yang, L. Jin, D. Tao, et al,“Single-layer unsupervised feature learning with L2regularized sparse filtering”[C], unpublished.
    [181] J. Yang, K. Yu, Y. Gong, et al,“Linear spatial pyramid matching using sparse coding forimage classification”[C], in IEEE Conference on Computer Vision and PatternRecognition,2009.
    [182] L. Yang, R. Jin, R. Sukthankar,“An efficient algorithm for local distance metriclearning”[C], in National Conference onArtificial Intelligence,2006.
    [183] H. Yoo, H. Park, D. Jang,“Expert system for color image retrieval”[J], Expert Systemswith Application,28(2):347-357,2005.
    [184] A. Yuan, G. Bai, L. Jiao, et al,“Offline handwritten English character recognition basedon convolutional neural network”[C], in IAPR International Workshop on DocumentAnalysis Systems,2012.
    [185] M. Zeiler, G. Taylor, R. Fergus,“Adaptive deconvolutional networks for mid and highlevel feature learning”[C], in IEEE International Conference on Computer Vision,2011.
    [186] S. Zhang, L. Jin, D. Tao, Z. Yang,“A faster method for Chinese font recognition basedon Harris corner”[C], in IEEE International Conference on Systems, Man, andCybernetics,2013.
    [187] W. Zheng, S. Gong, T. Xiang,“Re-identification by Relative Distance Comparison”[J],IEEE Transactions on Pattern Analysis and Machine Intelligence,35(3):653-668,2013.
    [188] N. Zhou, Y. Shen, J. Peng, et al,“Learning inter-related visual dictionary for objectrecognition”[C], in IEEE Conference on Computer Vision and Pattern Recognition,2012.