基于语义DCNN特征融合的细粒度车型识别模型

英文篇名：Fine-Grained Car Recognition Model Based on Semantic DCNN Features Fusion
作者：杨娟 ; 曹浩宇 ; 汪荣贵 ; 薛丽霞
英文作者：Yang Juan;Cao Haoyu;Wang Ronggui;Xue Lixia;School of Computer and Information, Hefei University of Technology;
关键词：车型识别 ; 细粒度车型识别 ; 卷积神经网络 ; 深度学习 ; 细粒度分类 ; 图像分类
英文关键词：car recognition;;fine-grained car recognition;;convolutional neural networks;;deep learning;;fine-grained recognition;;image classification
中文刊名：JSJF
英文刊名：Journal of Computer-Aided Design & Computer Graphics
机构：合肥工业大学计算机与信息学院;
出版日期：2019-01-15
出版单位：计算机辅助设计与图形学学报
年：2019
期：v.31
基金：国家自然科学基金(61672202)
语种：中文;
页：JSJF201901018
页数：17
CN：01
ISSN：11-2925/TP
分类号：143-159

摘要

针对深度卷积神经网络模型缺乏对语义信息的表征能力,而细粒度视觉识别中种类间视觉差异微小且多集中在关键的语义部位的问题,提出基于语义信息融合的深度卷积神经网络模型及细粒度车型识别模型.该模型由定位网络和识别网络组成,通过定位网络FasterRCNN获取车辆目标及各语义部件的具体位置;借助识别网络提取目标车辆及各语义部件的特征,再使用小核卷积实现特征拼接和融合;最后经过深层神经网络得到最终识别结果.实验结果表明,文中模型在斯坦福BMW-10数据集的识别准确率为78.74%,高于VGG网络13.39%;在斯坦福cars-197数据集的识别准确率为85.94%,其迁移学习模型在BMVC car-types数据集的识别准确率为98.27%,比该数据集目前最好的识别效果提高3.77%;该模型避免了细粒度车型识别对于车辆目标及语义部件位置的依赖,并具有较高的识别准确率及通用性.
As the deep convolution neural networks(DCNN) lack the ability of representation of semantic information, while visual differences between classes are small and concentrated on key semantic parts during the fine-grained categorization, this paper proposes a model based on fusing semantic information of DCNN features, which is composed of the detection sub-network and the classification sub-network. Firstly, by use of the detection sub-network we capture the definite position of car object and each semantic parts through Faster RCNN. Secondly, the classification sub-network extracts the overall car object features and semantic parts features of the object via DCNN, then processes the joint and fusion of features by using small kernel convolution. Finally, we obtain final recognition result through deep neural network. The recognition accuracy of our model is 78.74% in Stanford BMW-10 dataset, which is 13.39% higher than the VGG network method and 85.94% in the Stanford cars-197 dataset. And the recognition accuracy of the transfer learning models in BMVC car-types dataset is 98.27%, which is 3.77% higher than the best recognition result of the dataset. Experimental results show that our model avoids the dependence of the fine-grained car recognition on the positions of car object and semantic parts, with high recognition accuracy and versatility.

引文

[1]Russakovsky O,Deng J,Su H,et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252
    [2]Everingham M,Eslami S M A,van Gool L,et al.The PASCALvisual object classes challenge:a retrospective[J].International Journal of Computer Vision,2015,111(1):98-136
    [3]Girshick R,Donahue J,Darrell T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2014:580-587
    [4]Uijlings J R R,van de Sande K E A,Gevers T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171
    [5]He K M,Zhang X Y,Ren S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916
    [6]Girshick R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEEComputer Society Press,2015:1440-1448
    [7]Ren S Q,He K M,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149
    [8]Dalal N,Triggs B.Histograms of oriented gradients for human detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEEComputer Society Press,2005:886-893
    [9]Lowe D G.Object recognition from local scale-invariant features[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,1999:1150-1157
    [10]Hinton G E,Osindero S,Teh Y W,et al.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554
    [11]Krizhevsky A,Sutskever I,Hinton G E,et al.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90
    [12]Srivastava N,Hinton G E,Krizhevsky A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958
    [13]Wan L,Zeiler M,Zhang S X,et al.Regularization of neural networks using drop connect[J].Journal of Machine Learning Research,2013,28:1058-1066
    [14]Nair V,Hinton G E.Rectified linear units improve restricted Boltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning.Madison:Omnipress,2010:807-814
    [15]Simonyan K,Zisserman A.Very deep convolutional networks for large-scale image recognition[OL].[2018-02-01].http://arxiv.org/abs/1409.1556
    [16]Zeiler M D,Fergus R.Visualizing and understanding convolutional networks[C]//Proceedings of the 13th European Conference on Computer Vision.Aire-la-Ville:Eurographics Association Press,2014:818-833
    [17]He K M,Zhang X Y,Ren S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEEComputer Society Press,2015:770-778
    [18]Szegedy C,Liu W,Jia Y Q,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2015:1-9
    [19]Cimpoi M,Maji S,Vedaldi A,et al.Deep filter banks for texture recognition and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2015:3828-3836
    [20]Berg T,Liu J X,Lee S W,et al.Birdsnap:large-scale fine-grained visual categorization of birds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2014:2019-2026
    [21]Belhumeur P N,Chen D Z,Feiner S,et al.Searching the world’s herbaria:a system for visual identification of plant species[C]//Proceedings of the 10th European Conference on Computer Vision.Heidelberg:Springer,2008:116-129
    [22]Krause J,Stark M,Deng J,et al.3D object representations for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.Los Alamitos:IEEE Computer Society Press,2013:554-561
    [23]Krause J,Gebru T,Deng J,et al.Learning features and parts for fine-grained recognition[C]//Proceedings of the 22nd IEEEInternational Conference on Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2014:26-33
    [24]Gavves E,Fernando B,Snoek C G M,et al.Local alignments for fine-grained categorization[J].International Journal of Computer Vision,2015,111(2):191-212
    [25]Zhang N,Farrell R,Iandola F N,et al.Deformable part descriptors for fine-grained recognition and attribute prediction[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2013:729-736
    [26]Pearce G,Pears N.Automatic make and model recognition from frontal images of cars[C]//Proceedings of the 8th IEEEInternational Conference on Advanced Video and Signal Based Surveillance.Los Alamitos:IEEE Computer Society Press,2011:373-378
    [27]Lazebnik S,Schmid C,Ponce J,et al.Beyond bags of features:spatial pyramid matching for recognizing natural scene categories[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2006:2169-2178
    [28]Wang J J,Yang J C,Yu K,et al.Locality-constrained linear coding for image classification[C]//Proceedings of the IEEEComputer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2010:3360-3367
    [29]Deng J,Krause J,Li F F,et al.Fine-grained crowdsourcing for fine-grained recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2013:580-587
    [30]Xie S N,Yang T B,Wang X Y,et al.Hyper-class augmented and regularized deep learning for fine-grained image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2015:2645-2654
    [31]Ark A,Kyle S,Artem Y,et al.Beyond Fine Tuning:A Modular Approach to Learning on Small Data[OL].[2018-02-01].http://cn.arxiv.org/abs/1611.01714
    [32]Lin T Y,Roychowdhury A,Maji S,et al.Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2015:1449-1457
    [33]Yang L J,Luo P,Loy C C,et al.A large-scale car dataset for fine-grained categorization and verification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2015:3973-3981
    [34]Karpathy A,Toderici G,Shetty S,et al.Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2014:1725-1732
    [35]Feichtenhofer C,Pinz A,Zisserman A,et al.Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2016:1933-1941
    [36]Jia Y Q,Shelhamer E,Donahue J,et al.Caffe:convolutional architecture for fast feature embedding[C]//Proceedings of the22nd ACM International Conference on Multimedia.New York:ACM Press,2014:675-678
    [37]Stark M,Krause J,Pepik B,et al.Fine-grained categorization for 3D scene understanding[C]//Proceedings of the 23rd British Machine Vision Conference.Aire-la-Ville:Eurographics Association Press,2012:1-12
    [38]Razavian A S,Azizpour H,Sullivan J,et al.CNN features off-the-shelf:an astounding baseline for recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Los Alamitos:IEEE Computer Society Press,2014:512-519
    [39]Xiao T J,Xu Y C,Yang K Y,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2015:842-850
    [40]Bo L,Ren X,Fox D,et al.Kernel descriptors for visual recognition[C]//Proceedings
    [41]of the Annual Conference on Neural Information Processing Systems.New York:Curran Associates Press,2010:244-252
    [42]Gosselin P H,Murray N,J′egou H,et al.Revisiting the fisher vector for fine-grained classification[J].Pattern Recognition Letters,2014,49:92-98
    [43]Chai Y N,Lempitsky V,Zisserman A,et al.Symbiotic segmentation and part localization for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2013:321-328

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700