用户名: 密码: 验证码:
两分类器融合的中文微博用户性别分类方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Gender classification method for Chinese micro-blog users based on two classifier fusion
  • 作者:张璞 ; 陈超 ; 陈韬 ; 王永
  • 英文作者:ZHANG Pu;CHEN Chao;CHEN Tao;WANG Yong;College of Computer Science and Technology,Chongqing University of Posts and Telecommunications;College of Economics and Management,Chongqing University of Posts and Telecommunications;
  • 关键词:中文微博 ; 性别分类 ; 微博文本特征 ; 卷积神经网络 ; 分类器融合
  • 英文关键词:Chinese micro-blog;;gender classification;;micro-blog text features;;convolutional neural network;;classifier fusion
  • 中文刊名:SJSJ
  • 英文刊名:Computer Engineering and Design
  • 机构:重庆邮电大学计算机科学与技术学院;重庆邮电大学经济管理学院;
  • 出版日期:2019-01-16
  • 出版单位:计算机工程与设计
  • 年:2019
  • 期:v.40;No.385
  • 基金:教育部人文社会科学研究青年基金项目(17YJCZH247);; 重庆市教委人文社会科学研究“社会媒体背景下的产品评论挖掘及应用研究”基金项目(17SKG055);; 国家自然科学基金项目(61472464);; 重庆邮电大学社会科学基金重点基金项目(2018KZD06)
  • 语种:中文;
  • 页:SJSJ201901044
  • 页数:5
  • CN:01
  • ISSN:11-1775/TP
  • 分类号:276-280
摘要
针对中文微博用户性别分类研究工作较少,微博特征提取不完善,分类准确率有待提升的问题,提出一种两分类器融合的中文微博用户性别分类方法。从微博文本数据中提取一系列手工特征构建分类器,得到分类结果;利用卷积神经网络模型自动提取特征,对用户的性别进行分类并得到分类结果;将两个分类器的结果利用XGBoost模型进行融合,得到最终的用户性别分类结果。实验结果表明,该方法相比一系列对比方法有更好的分类结果。
        Aiming at the problem that Chinese micro-blog user gender classification research is less,micro-blog feature extraction is not perfect,and classification accuracy needs to be improved,agender classification method based on two classifier fusion for Chinese micro-blog users was proposed.A series of manual features was extracted to build a classifier from micro-blog texts and the classification results were got.The automatic features extracted by convolutional neural network model were used to build another classifier and obtain the classification results.The results of the two classifiers were fused using the XGBoost model to obtain the final user gender classification results.Experimental results show that the proposed method has better classification results than a series of contrast methods.
引文
[1]Ciot M,Sonderegger M,Ruths D.Gender inference of Twitter users in non-English contexts[C]//Proceedings of Empirical Methods in Natural Language Processing,2013:1136-1145.
    [2]Miller Z,Dickinson B,Hu W.Gender prediction on Twitter using stream algorithms with n-gram character features[J].International Journal of Intelligence Science,2012,2(4):143-148.
    [3]LIU Baoqin,NIU Yun.Gender recognition of Chinese microblog users based on emotional features[J].Computer Engineering&Science,2016,38(9):1917-1923(in Chinese).[刘宝芹,牛耘.基于情绪特征的中文微博用户性别识别[J].计算机工程与科学,2016,38(9):1917-1923.]
    [4]DAI Bin,LI Shoushan,GONG Zhengxian,et al.Semi-supervised gender classification with multiple type of text[J].Journal of Shanxi University(Natural Science Edition),2017,40(1):14-20(in Chinese).[戴斌,李寿山,贡正仙,等.基于多类型文本的半监督性别分类方法研究[J].山西大学学报(自然科学版),2017,40(1):14-20.]
    [5]Li S, Wang J,Zhou G,et al.Interactive gender inference with integer linear programming[C]//International Joint Conference on Artificial Intelligence,2015:2341-2347.
    [6]WANG Jingjing,LI Shoushan, HUANG Lei.User gender classification in Chinese microblog[J].Journal of Chinese Information Processing,2014,28(6):150-155(in Chinese).[王晶晶,李寿山,黄磊.中文微博用户性别分类方法研究[J].中文信息学报,2014,28(6):150-155.]
    [7]ZHOU Feiyan,Jin Linpeng,DONG Jun.Review of convolutional neural network[J].Chinese Journal of Computers,2017,40(6):1229-1251(in Chinese).[周飞燕,金林鹏,董军.卷积神经网络研究综述[J].计算机学报,2017,40(6):1229-1251.]
    [8]WANG Shengyu,ZENG Biqing,HU Pianpian.Chinese sentiment analysis based on parameter optimization of convolutional neural network[J].Computer Engineering,2017,43(8):200-207(in Chinese).[王盛玉,曾碧卿,胡翩翩.基于卷积神经网络参数优化的中文情感分析[J].计算机工程,2017,43(8):200-207.]
    [9]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems,2013:3111-3119.
    [10]Chen T,Guestrin C.XGBoost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2016:785-794.
    [11]XU Jianmin,LI Wulin,WU Shufang,et al.Modeling user reliability based on logistic regression in micro-blog[J].Computer Engineering and Design,2015,36(3):772-777(in Chinese).[徐建民,粟武林,吴树芳,等.基于逻辑回归的微博用户可信度建模[J].计算机工程与设计,2015,36(3):772-777.]
    [12]ZHANG Rongfang.Gender differences on the use of the contemporary Chinese modal particles[D].Dalian:Dalian University of Technology,2014(in Chinese).[张蓉芳.现代汉语语气词使用上的性别差异研究[D].大连:大连理工大学,2014.]
    [13]Kim Y.Convolutional neural networks for sentence classification[C]//Proceedings of Empirical Methods in Natural Language Processing,2014:1746-1751.
    [14]YANG Yuting, WANG Mingyang,TIAN Xianyun,et al.Sina microblog sentiment classification based on distributed representation of documents[J].Journal of Intelligence,2016,35(2):151-156(in Chinese).[杨宇婷,王名扬,田宪允,等.基于文档分布式表达的新浪微博情感分类研究[J].情报杂志,2016,35(2):151-156.]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700