详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
With the popularity of digital devices and smart mobiles, and with the popularity of social networks and photo sharing by internet, the scale of web images becomes larger and larger and there are more and more requirements for the associated applications. Large-scale image data and its associated applications are a great challenge and also a good chance for the research topics in the image recognition area, such as object detection, image classification and image retrieval.
     In the past few years, object retrieval is the hot topic of image retrieval. The sparse image representation generated by a large vocabulary is a good way for the fast search in image retrieval. By our studies on learning visual pattern in local feature space and on image representation, we can generate high-performance image representation rapidly, so as to contribute to a better image retrieval system.
     To perform the recognition for large-scale images, visual attributes learning and mid-level image representation become hot research topics in recently years. We studied the learning of visual attributes and the generation of mid-level representation, to learn large-scale attributes rapidly and generate high-performance mid-level representation for recognition and retrieval.
     Our contributions and novelty are summarized as follows.
     (1) To handle the bottleneck of the available large-scale image retrieval system, we proposed an algorithm for the fast construction of high-performance visual vocabulary. Large-scale image retrieval system depends on large-scale vocabulary, to generate sparse representation indexed by inverted table for fast and exact search. Using the inheritance of visual patterns in the iterations of approximate algorithm, we proposed a robust approximate algorithm that guarantees convergence rapidly. The proposed algorithm requires nearly no more consumption of time and memory. Theoretical proofs guarantee that the algorithm converges to the converged solution of the exact algorithm. The experiment results show that the speed of our algorithm is about10times that of the available state-of-the-art algorithm for generating the equivalent vocabularies. By utilizing it, large-scale image retrieval system is easy to generate an even larger vocabulary with high performance, which is an effective technical support for the search speed and performance of the retrieval system. Besides, the proposed algorithm is also used in other tasks of visual pattern discovery, to construct a set of visual patterns rapidly.
     (2) In the large-scale image retrieval system, to handle the generation of image representation, we proposed a high-performance parameter-insensitive algorithm of quantizing the local feature and generating image representation. By the locality of the Gaussian kernel function, we proposed an algorithm to minimize the kernel reconstruction error. The proposed algorithm utilizes more neighbors in a better way to generate high-performance and sparse image representation; the learnt quantization weights get more information from the distance so that the image representation is more insensitive to the neighbor number parameter.
     (3) For the representation of general images, we proposed an indirect method, motivated by linear representation, to learn large-scale latent visual attributes rapidly and generate high-performance image representation. In the area of attribute-based mid-level representation, most available works concatenate the outputs of attribute models into a long vector as the representation. We proposed to indirectly learn visual attributes by learning one semantic subspace. The subspace learning algorithm can learn large-scale latent visual attributes rapidly into the semantic subspace. The semantic subspace is rich of semantic concepts so that the linear representation generated by linear projections is high-performance. Besides, the linear projects are semantic-aware and can be manually labeled with descriptions.
     (4) In the representation of general images, we proposed a nonlinear representation based on visual attributes for high-performance representation. All the works of representing in linear form have the shortcomings that they cannot utilize all the information of attribute models. The proposed representation scheme is motivated by the nonlinear representation in other problems. The scheme contains requirements for the3procedures, the attribute definition, the attribute model learning, and the representation generation:the attribute is defined as a quite biased binary classification; the learning model is advised to use supper vector machine; the representation is generated by nonlinear mapping with a proper scale value as the parameter. The experiments show that nonlinear representation can improve the representation significantly.
     By the former2works, we proposed a scheme to generate high-performance sparse representation, which guarantee that the large-scale image retrieval system can generate high-dimension sparse representation rapidly.
     The latter2works study the visual attribute and mid-level in the views of both the linear representation and nonlinear representation. The proposed method to fast learn liner representation and the proposed scheme to generate high-performance nonlinear representation are helpful for the future works on visual attributes and high-performance mid-level representation.
Josef Sivic and Andrew Zisserma,2003. Video Google:A text retrieval approach to object matching in videos. In Computer Vision,2003. Proceedings. Ninth IEEE International Conference on (pp.1470-1477). IEEE.
    David Nister and Henrik Stewenius,2006. Scalable recognition with a vocabulary tree. In Computer Vision and Pattern Recognition,2006 IEEE Computer Society Conference on (Vol. 2, pp.2161-2168). IEEE.
    James Philbin et al,2007. Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition,2007. CVPR'07. IEEE Conference on (pp.1-8). IEEE.
    James Philbin et al,2008. Lost in quantization:Improving particular object retrieval in large scale image databases. In Computer Vision and Pattern Recognition,2008. CVPR 2008. IEEE Conference on (pp.1-8). IEEE.
    James Philbin and Andrew Zisserman,2008. Object mining using a matching graph on very large image collections. In Computer Vision, Graphics & Image Processing,2008. ICVGIP'08. Sixth Indian Conference on (pp.738-745). IEEE
    Xiaowei Li, et al,2008. Modeling and recognition of landmark image collections using iconic scene graphs. In Computer Vision-ECCV 2008 (pp.427-440). Springer Berlin Heidelberg.
    Antonio Torralba, Robert Fergus, and William T. Freeman,2008.80 million tiny images:A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on,30(11),1958-1970.
    Stuart Lloyd,1982. Least squares quantization in PCM. Information Theory, IEEE Transactions on,28(2),129-137.
    Shokri Z. Selim and Mohamed A. Ismail,1984. K-means-type algorithms:a generalized convergence theorem and characterization of local optimality. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (1),81-87.
    Anil K. Jain,2010. Data clustering:50 years beyond K-means. Pattern Recognition Letters, 31(8),651-666.
    Dan Judd, Philip K. McKinley, and Anil K. Jain,1996. Large-scale parallel data clustering. In Pattern Recognition,1996., Proceedings of the 13th International Conference on (Vol.4, pp. 488-493). IEEE.
    George Kollios et al,2003. Efficient biased sampling for approximate clustering and outlier
    detection in large data sets. Knowledge and data engineering, ieee transactions on,15(5), 1170-1187.
    Chanop Silpa-Anan, and Richard Hartley,2008. Optimised KD-trees for fast image descriptor matching. In Computer Vision and Pattern Recognition,2008. CVPR 2008. IEEE Conference on (pp.1-8). IEEE.
    Marius Muja, and David G. Lowe,2009. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration. In VISAPP (1) (pp.331-340).
    David Arthur, and Sergei Vassilvitskii,2007. k-means++:The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027-1035). Society for Industrial and Applied Mathematics.
    Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce,2006. Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition,2006 IEEE Computer Society Conference on (Vol.2, pp.2169-2178). IEEE.
    Jianchao Yang et al,2009. Linear spatial pyramid matching using sparse coding for image classification. In Computer Vision and Pattern Recognition,2009. CVPR 2009. IEEE Conference on (pp.1794-1801). IEEE.
    Kai Yu, Tong Zhang, and Yihong Gong,2009. Nonlinear Learning using Local Coordinate Coding. In NIPS (Vol.9, p.1).
    Jinjun Wang et al,2010. Locality-constrained linear coding for image classification. In Computer Vision and Pattern Recognition (CVPR),2010 IEEE Conference on (pp.3360-3367). IEEE.
    Xinmei Tian et al,2008. Transductive video annotation via local learnable kernel classifier. In Multimedia and Expo,2008 IEEE International Conference on (pp.1509-1512). IEEE.
    Neeraj Kumar et al,2009. Attribute and simile classifiers for face verification. In Computer Vision,2009 IEEE 12th International Conference on (pp.365-372). IEEE.
    Ali Farhadi et al,2009. Describing objects by their attributes. In Computer Vision and Pattern Recognition,2009. CVPR 2009. IEEE Conference on (pp.1778-1785). IEEE.
    Christoph H. Lampert, Hannes Nickisch, and Stefan Harmeling,2009. Learning to detect unseen object classes by between-class attribute transfer. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp.951-958). IEEE.
    Li-Jia Li et al,2010. Object Bank:A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification. In NIPS (Vol.2, No.3, p.5).
    Lorenzo Torresani, Martin Szummer, and Andrew Fitzgibbon,2010. Efficient object category recognition using classemes. In Computer Vision-ECCV 2010 (pp.776-789). Springer Berlin Heidelberg.
    Alessandro Bergamo, Lorenzo Torresani and Andrew W. Fitzgibbon,2011. PiCoDes:Learning a Compact Code for Novel-Category Recognition. In NIPS (pp.2088-2096).
    Devi Parikh, and Kristen Grauman,2011. Relative attributes. In Computer Vision (ICCV),2011 IEEE International Conference on (pp.503-510). IEEE.
    Milind Naphade et al,2006. Large-scale concept ontology for multimedia. MultiMedia, IEEE, 13(3),86-91. LSCOM:Cyc ontology dated (2006-06-30), http://lastlaugh.inf.cs.cmu.edu/lscom/ontology/LSCOM-20060630.txt, http://www.lscom.org/ontology/index.html
    Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze,2008. Introduction to information retrieval (Vol.1, p.6). Cambridge:Cambridge university press.
    Haifeng Li, Tao Jiang, and Keshu Zhang,2006. Efficient and robust feature extraction by maximum margin criterion. Neural Networks, IEEE Transactions on,17(1),157-165.
    Haesun Park, Moongu Jeon, and J. Ben Rosen,2003. Lower dimensional representation of text data based on centroids and least squares. BIT Numerical mathematics,43(2),427-448.
    Jun Yan et al,2006. Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing. Knowledge and Data Engineering, IEEE Transactions on,18(3), 320-333.
    Anna Bosch, Andrew Zisserman, and Xavier Munoz,2007. Image classification using random forests and ferns.
    Andrea Vedaldi, and Andrew Zisserman,2012. Efficient additive kernels via explicit feature maps. Pattern Analysis and Machine Intelligence, IEEE Transactions on,34(3),480-492. http://www.vlfeat.org/index.html
    Gregory Griffin, Alex Holub, and Pietro Perona,2007. Caltech-256 object category dataset.
    Karl Pearson,1901. LⅢ. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science,2(11), 559-572.
    Peter Gehler and Sebastian Nowozin,2009. On feature combination for multiclass object classification. In Computer Vision,2009 IEEE 12th International Conference on (pp.221-228). IEEE.
    Vittorio Ferrari and Andrew Zissermanm,2007. Learning Visual Attributes. In NIPS.
    Devi Parikh and Kristen Grauman,2011b. Interactively building a discriminative vocabulary of nameable attributes. In Computer Vision and Pattern Recognition (CVPR),2011 IEEE Conference on (pp.1681-1688). IEEE.
    Li-Jia Li et al,2010a. Objects as attributes for scene classification. In Proceedings of the 11th European conference on Trends and Topics in Computer Vision-Volume Part I (pp.57-69). Springer-Verlag.
    Lingqiao Liu, Lei Wang, and Xinwang Liu,2011. In defense of soft-assignment coding. In Computer Vision (ICCV),2011 IEEE International Conference on (pp.2486-2493). IEEE.
    X-J. Wang, Lei Zhang, and Wei-Ying Ma,2012. Duplicate-search-based image annotation using web-scale data. Proceedings of the IEEE,100(9),2705-2721.
    Robert E. Wilson, Samuel D. Gosling and Lindsay T. Graham,2012. A review of Facebook research in the social sciences, Perspectives on Psychological Science.
    Carolina Dania,2012. Modeling social networking privacy. In ESSoS Doctoral Symposium. Angelina I. T. Kiser,2011. Benefits and Risks of Social Networking Sites:Should they also be Used to Harness Communication in a College or University Setting, IJDLDC.
    Kenneth A. Vercammen,2012. Social Networking Websites for Business and Exposure. PRESIDENT'S PERSPECTIVE,39.
    Greg Jarboe,2011. YouTube and video marketing:An hour a day. John Wiley & Sons.
    Tamara L. Berg, Alexander C. Berg, and Jonathan Shih,2010. Automatic attribute discovery and characterization from noisy web data. In Computer Vision-ECCV 2010 (pp.663-676). Springer Berlin Heidelberg.
    Olga Russakovsky and Li Fei-Fei,2012. Attribute learning in large-scale datasets. In Trends and Topics in Computer Vision (pp.1-14). Springer Berlin Heidelberg.
    David G. Lowe,1999. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on (Vol.2, pp. 1150-1157). leee.
    Devi Parikh et al,2012. Relative Attributes for Enhanced Human-Machine Communication. In AAAI.
    Antonio Torralba, and Alexei A. Efros,2011. Unbiased look at dataset bias. In Computer Vision and Pattern Recognition (CVPR),2011 IEEE Conference on (pp.1521-1528). IEEE
    Gary B. Huang et al,2007. Labeled faces in the wild:A database for studying face recognition in unconstrained environments (Vol.1, No.2, p.3). Technical Report 07-49, University of Massachusetts, Amherst.
    David Edmundon and Gerald Schaefer,2013. Visualisation and Browsing of Flickr Retrieval Results. In Pattern Recognition (ACPR),2013 2nd IAPR Asian Conference on (pp.734-735). IEEE.
    Iljung S. Kwak et al,2013. From Bikers to Surfers:Visual Recognition of Urban Tribes.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700