注意力迁移的联合平衡领域自适应
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Learning transferrable attention for joint balanced domain adaptation
  • 作者:汪荣贵 ; 姚旭晨 ; 杨娟 ; 薛丽霞
  • 英文作者:Wang Ronggui;Yao Xuchen;Yang Juan;Xue Lixia;School of Computer and Information,Hefei University of Technology;
  • 关键词:迁移学习 ; 领域自适应 ; 注意力机制 ; 无监督学习 ; 图像识别 ; 卷积神经网络
  • 英文关键词:transfer learning;;domain adaptation;;attention mechanism;;unsupervised learning;;image recognition;;convolutional neural networks
  • 中文刊名:ZGTB
  • 英文刊名:Journal of Image and Graphics
  • 机构:合肥工业大学计算机与信息学院;
  • 出版日期:2019-07-16
  • 出版单位:中国图象图形学报
  • 年:2019
  • 期:v.24;No.279
  • 语种:中文;
  • 页:ZGTB201907011
  • 页数:10
  • CN:07
  • ISSN:11-3758/TB
  • 分类号:110-119
摘要
目的现有的图像识别方法应用于从同一分布中提取的训练数据和测试数据时具有良好性能,但这些方法在实际场景中并不适用,从而导致识别精度降低。使用领域自适应方法是解决此类问题的有效途径,领域自适应方法旨在解决来自两个领域相关但分布不同的数据问题。方法通过对数据分布的分析,提出一种基于注意力迁移的联合平衡自适应方法,将源域有标签数据中提取的图像特征迁移至无标签的目标域。首先,使用注意力迁移机制将有标签源域数据的空间类别信息迁移至无标签的目标域。通过定义卷积神经网络的注意力,使用关注信息来提高图像识别精度。其次,基于目标数据集引入网络参数的先验分布,并且赋予网络自动调整每个领域对齐层特征对齐的能力。最后,通过跨域偏差来描述特定领域的特征对齐层的输入分布,定量地表示每层学习到的领域适应性程度。结果该方法在数据集Office-31上平均识别准确率为77. 6%,在数据集Office-Caltech上平均识别准确率为90. 7%,不仅大幅领先于传统手工特征方法,而且取得了与目前最优的方法相当的识别性能。结论注意力迁移的联合平衡领域自适应方法不仅可以获得较高的识别精度,而且能够自动学习领域间特征的对齐程度,同时也验证了进行域间特征迁移可以提高网络优化效果这一结论。
        Objective Many image recognition methods demonstrate good performance when applied to training and test data extracted from the same distribution. However,these methods are unsuitable in practical scenarios and result in low performance. Using domain adaptive methods is an effective approach for solving such problem. Domain adaptation aims to solve various problems,such as when data are from two related domains but with different distributions. In practical applications,labeling data takes substantial manual labor. Thus,unsupervised learning has become a clear trend in image recognition. Transfer learning can extract knowledge from the labeled data in the source domain and transfer it to the unlabeled target domain. Method We propose a joint balanced adaptive method based on attention transfer mechanism,which transfers feature representations extracted from the labeled datasets in the source domain to the unlabeled datasets in the target domain. Specifically,we first transfer the labeled source-domain space category information to the unlabeled target domain via attention transfer mechanism. Neural networks reflect the basic characteristics of the human brain,and attention is precisely an important part of the human visual experience and closely related to perception. Artificial attention mechanism started to be developed as artificial neural network has become increasingly popular in various fields,such as computer vision and pattern recognition. Allowing a system to learn attending objects and understand the mechanism behind neural networks has become a research tool. Attention information can be used to improve image recognition accuracy significantly by defining the attention of convolutional neural networks( CNNs). In this study,attention can be seen as a set of spatial mappings that encode the spatial regions highly concerned with the network input to determine its possible output. Second,we introduce the prior distribution of the network parameters on the basis of the target dataset and endow the layer with the capability of automatically learning the alignment degree that should be pursued at different levels of the network. We expect to explore abundant source-domain attributes through cross-domain learning and capture substantial complex crossdomain knowledge by embedding cross-dataset information for minimizing the original function loss for the learning tasks in two domains as much as possible. Machine learning is an alternative approach for recognizing the refined features after preprocessing raw data into features on the basis of prior knowledge of humans. Machine learning experts have spent most of their time designing features in the past few years because recognition results depend on the quality of features. Recent breakthrough in object recognition has been mainly achieved by approaches based on deep CNN due to its more powerful feature extraction and image representation capabilities than manually defined features,such as HOG and SIFT. The higher the network layers are,the more specific the characteristics are for the target categorization tasks. Meanwhile,the features on successive layers interact with each other in a complex and fragile way. Accordingly,the neurons between neighboring layers co-adapt during training. Therefore,the mobility of features and classifiers decreases as the cross-domain difference increases. Finally,we describe the input distribution of the domain-specific adaptive alignment layer by introducing crossdomain biases,thereby quantitatively indicating the inter-domain adaptation degree that each layer learns. Meanwhile,we adaptively change the weight of each category in the dataset. Although deep CNN is a unified training and prediction framework that combines multi-level feature extractors and recognizers,end-to-end processing is particularly important. The design concept for our model fully utilizes the capability of CNN to perform end-to-end processing. Result The average recognition accuracies of the method in datasets Office-31 and Office-Caltech are 77. 6% and 90. 7%,respectively. Thus,this method significantly outperforms traditional methods based on handcrafted feature and is also comparable with state-of-theart methods. Although not all single transfer tasks achieve optimal results,the average recognition accuracy of the six transfer tasks is improved compared with the current mainstream methods. Conclusion Transferring image features extracted from labeled data in the source domain to the unlabeled target domain effectively solves data problems from two domains that are related but differently distributed. The method fully utilizes the spatial location information of the labeled data in the source domain through attention transfer mechanism and uses the deep CNN to learn the alignment degree of the features between domains automatically. Learning ability largely depends on the degree of inter-domain correlation,which is a major limitation for transfer learning. In addition,knowledge transition is apparently ineffective if no similarity exists between the domains. Thus,we fully consider the feature correlation in the dataset between source and target domains and adaptively change the weight of each category in the dataset. Our method can not only effectively obtain high recognition accuracy but also automatically learn the degree of feature alignment between domains. This method also verifies that the inter-domain feature transfer can improve network optimization effect.
引文
[1]Fu Y H,Aldrich C.Froth image analysis by use of transfer learning and convolutional neural networks[J].Minerals Engineering,2018,115:68-78.[DOI:10.1016/j.mineng.2017.10.005]
    [2]Liu W J,Liang X J,Qu H C.Adaptively enhanced convolutional neural network algorithm for image recognition[J].Journal of Image and Graphics,2017,22(12):1723-1736.[刘万军,梁雪剑,曲海成.自适应增强卷积神经网络图像识别[J].中国图象图形学报,2017,22(12):1723-1736.][DOI:10.11834/jig.170079]
    [3]Saito K,Watanabe K,Ushiku Y,et al.Maximum classifier discrepancy for unsupervised domain adaptation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA:IEEE,2018:3723-3732.[DOI:10.1109/CVPR.2018.00392]
    [4]Long M S,Zhu H,Wang J M,et al.Deep transfer learning with joint adaptation networks[C]//Proceedings of the 34th International Conference on Machine Learning.Sydney,Australia:ACM,2017:2208-2217.
    [5]Long M S,Cao Y,Wang J M,et al.Learning transferable features with deep adaptation networks[C]//Proceedings of the32nd International Conference on Machine Learning.Lille,France:ACM,2015:97-105.
    [6]Tzeng E,Hoffman J,Zhang N,et al.Deep domain confusion:maximizing for domain invariance[EB/OL].[2018-08-12].https://arxiv.org/pdf/1412.3474.pdf.
    [7]Fan S J,Shen Z Q,Jiang M,et al.Emotional attention:a study of image sentiment and visual attention[C]//Proceedings of 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA:IEEE,2018:7521-7531.[DOI:10.1109/CVPR.2018.00785]
    [8]Li K P,Wu Z Y,Peng K C,et al.Tell me where to look:guided attention inference network[C]//Proceedings of 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA:IEEE,2018:9215-9223.[DOI:10.1109/CVPR.2018.00960]
    [9]Wang S Y,Zhou H Y,Yang Y.Kernel correlation adaptive target tracking based on convolution feature[J].Journal of Image and Graphics,2017,22(9):1230-1239.[王守义,周海英,杨阳.基于卷积特征的核相关自适应目标跟踪[J].中国图象图形学报,2017,22(9):1230-1239.][DOI:10.11834/jig.170009]
    [10]Hong S,Oh J,Lee H,et al.Learning transferrable knowledge for semantic segmentation with deep convolutional neural network[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,USA:IEEE,2016:3204-3212.[DOI:10.1109/CVPR.2016.349]
    [11]Xu K,Ba J,Kiros R,et al.Show,attend and tell:neural image caption generation with visual attention[EB/OL].[2018-08-12].https://arxiv.org/pdf/1502.03044.pdf.
    [12]Memisevic R.Learning to relate images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1829-1846.[DOI:10.1109/TPAMI.2013.53]
    [13]Ioffe S,Szegedy C.Batch normalization:accelerating deep network training by reducing internal covariate shift[EB/OL].[2018-08-12].https://arxiv.org/pdf/1502.03167.pdf.
    [14]Jia Y Q,Shelhamer E,Donahue J,et al.Caffe:convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia.Orlando,USA:ACM,2014:675-678.[DOI:10.1145/2647868.2654889]
    [15]Saenko K,Kulis B,Fritz M,et al.Adapting visual category models to new domains[C]//Proceedings of the 11th European Conference on Computer Vision.Heraklion,Crete,Greece:Springer,2010:213-226.[DOI:10.1007/978-3-642-15561-1_16]
    [16]Gong B Q,Shi Y,Sha F,et al.Geodesic flow kernel for unsupervised domain adaptation[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition.Providence,USA:IEEE,2012:2066-2073.[DOI:10.1109/CVPR.2012.6247911]
    [17]Griffin G,Holub A,Perona P.Caltech-256 object category dataset[R].California:California Institute of Technology,2007.
    [18]Krizhevsky A,Sutskever I,Hinton G E.Image Net classification with deep convolutional neural networks[C]//Proceedings of the25th International Conference on Neural Information Processing Systems.Lake Tahoe,USA:ACM,2012:1097-1105.
    [19]Long M S,Wang J M,et al.Unsupervised domain adaptation with residual transfer networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.Barcelona,Spain:ACM,2016:136-144.
    [20]Ganin Y,Lempitsky V.Unsupervised domain adaptation by backpropagation[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:ACM,2015:1180-1189.
    [21]Cariucci F M,Porzi L,Caputo B,et al.Auto DIAL:automatic domain alignment layers[C]//Proceedings of 2017 IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:5077-5085.[DOI:10.1109/ICCV.2017.542]
    [22]Pan S J,Tsang I W,Kwok J T,et al.Domain adaptation via transfer component analysis[J].IEEE Transactions on Neural Networks,2011,22(2):199-210.[DOI:10.1109/TNN.2010.2091281]
    [23]Li Y H,Wang N Y,Shi J P,et al.Adaptive batch normalization for practical domain adaptation[J].Pattern Recognition,2018,80:109-117.[DOI:10.1016/j.patcog.2018.03.005]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700