分类重构堆栈生成对抗网络的文本生成图像模型

英文篇名：Text to Image Model With Classification-Reconstruction Stack Generative Adversarial Networks
作者：陈鑫晶 ; 陈锻生
英文作者：CHEN Xinjing;CHEN Duansheng;College of Computer Science and Technology, Huaqiao University;
关键词：文本生成图像 ; 堆栈生成对抗网络 ; 分类 ; 重构 ; 跨模态学习
英文关键词：text to image;;stack generative adversarial networks;;classification;;reconstruction;;Cross-modal learning
中文刊名：HQDB
英文刊名：Journal of Huaqiao University(Natural Science)
机构：华侨大学计算机科学与技术学院;
出版日期：2019-07-17
出版单位：华侨大学学报(自然科学版)
年：2019
期：v.40;No.168
基金：国家自然科学基金资助项目(61502182);; 福建省科技计划重点项目(2015H0025)
语种：中文;
页：HQDB201904020
页数：7
CN：04
ISSN：35-1079/N
分类号：135-141

摘要

利用堆栈生成对抗网络,提出分类重构堆栈生成对抗网络.第一阶段生成64 px×64 px的图像,第二阶段生成256 px×256 px的图像.在每个阶段的文本生成图像中,加入图像类别信息、特征和像素重构信息辅助训练,生成质量更好的图像.将图像模型分别在Oxford-102、加利福尼亚理工学院鸟类数据库(CUB)和微软COCO(MS COCO)数据集上进行验证,使用Inception Score评估生成图像的质量和多样性.结果表明:提出的模型具有一定的效果,在3个数据集上的Inception Score值分别是3.54,4.16和11.45,相应比堆栈生成对抗网络提高10.6%,12.4%和35.5%.
Using the stack generative adversarial networks, we propose classification and reconstruction stack generative adversarial network. We have generated 64 px×64 px resolution images in the stage Ⅰ, then we synthesize 256 px×256 px resolution images in the Stage Ⅱ. In each stage of the text to image, we add the image category information, feature and pixel reconstruction information to assist in generating high-quality images. In this paper, we test and verify the presented model on Oxford-102, Caltech-University of California San Diego Birds(CUB) and Microsoft COCO(MS COCO) datasets, and evaluated the quality and diversity of generated images with Inception Score. The results show that the model proposed in this paper has certain effects, Inception Score on the three datasets were 3.54, 4.16 and 11.45, respectively, which increased by 10.6%, 12.4%, and 35.5% over the stack generative adversarial networks.

引文

[1] XU K,BA J,KIROS R,et al.Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning.Lille:[s.n.],2015:2048-2057.
    [2] 邹辉杜,吉祥翟,传敏,等.深度学习与一致性表示空间学习的跨媒体检索[J].华侨大学学报(自然科学版),2018,39(1):127-132.DOI:10.11830/ISSN.1000-5013.201508047.
    [3] WEI Yunchao,ZHAO Yao,LU Canyi,et al.Cross-modal retrieval with CNN visual features:A new baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460.DOI:10.1109/TCYB.2016.2519449.
    [4] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Dystems.Montreal:[s.n.],2014:2672-2680.
    [5] REED S.Generative adversarial text to image synthesis[J].International Machine Learning Society,2016(48):1060-1069.
    [6] NILSBACK M E,ZISSERMAN A.Automated flower classification over a large number of classes[C]//Conference on Computer Vision,Graphics and Image Processing.Washington:IEEE Press,2008:722-729.DOI:10.1109/ICVGIP.2008.47.
    [7] WAH C,BRANSON S,WELINDER P,et al.Caltech-UCSD birds 200[EB/OL].[2011-10-26][2018-06-15].http://www.vision.caltech.edu/visipedia/CUB-200.html.
    [8] LIN Tsungyi,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common objects in context[C]//European Conference on Computer Vision.Zurich:[s.n.],2014:740-755.
    [9] ZHANG Han,XU Tao,LI Hongsheng,et al.Stackgan:Text to photo-realistic image synthesis with stacked generative adversarial networks[J].IEEE International Conference on Computer Vision,2017,2(3):5908-5916.DOI:10.1109/ICCV.2017.629.
    [10] ZHANG Han,XU Tao,LI Hongsheng,et al.Stackgan++:Realistic image synthesis with stacked generative adversarial networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017(99):1.DOI:10.1109/TPAMI.2018.2856256.
    [11] AYUSHMAN D.TAC-GAN-text conditioned auxiliary classifier generative adversarial network[J/OL].[2017-03-26][2018-07-10].https://arxiv.org/abs/1703.06412.
    [12] NGUYEN A.Plug and play generative networks:Conditional iterative generation of images in latent space[J].IEEE Conference on Computer Vision and Pattern Recognition,2017(21):3510-3520.DOI:10.1109/CVPR.2017.374.
    [13] SHIKHAR S.ChatPainter:Improving text to image generation using dialogue[J/OL].[2018-02-22][2018-06-12].https://arxiv.org/abs/1802.08216.
    [14] REED S,AKATA Z,LEE H,et al.Learning deep representations of fine-grained visual descriptions[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:49-58.DOI:10.1109/CVPR.2016.13.
    [15] SALIMANS T,GOODFELLOW I,ZAREMBA W,et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Barcelona:[s.n.],2016:2234-2242.
    [16] DAS A,KOTTUR S,GUPTA K,et al.Visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2017:1080-1089.DOI:10.1109/TPAMI.2018.2828437.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700