基于轨迹时空词向量的用户年龄特征识别
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:User Age Group Recognition Based on Spatio-Temporal Word Embedding of Trajectory
  • 作者:吴浩 ; 张威强 ; 张朋柱
  • 英文作者:WU Hao;ZHANG Weiqiang;ZHANG Pengzhu;Antai College of Economics & Management,Shanghai Jiao Tong University;
  • 关键词:语义轨迹 ; 词频—逆文本频率 ; 词向量 ; Word2vec ; 分类
  • 英文关键词:sematic trajectory;;TF-IDF;;word embedding;;word2vec;;classification
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:上海交通大学安泰经济与管理学院;
  • 出版日期:2019-07-15
  • 出版单位:中文信息学报
  • 年:2019
  • 期:v.33
  • 基金:国家自然科学基金(91646205,71421002);; 上海交通大学中央高校基本科研业务费资助项目(16JCCS08)
  • 语种:中文;
  • 页:MESS201907015
  • 页数:10
  • CN:07
  • ISSN:11-2325/N
  • 分类号:123-132
摘要
用户移动上网访问基站的轨迹数据从时间和空间上反映了用户的生活习惯和行为模式。时间和空间信息同时产生不应分别考虑。因此,该文在传统的TF-IDF方法基础上提出了与时间相关的TFT-IDFT方法,用以提取轨迹点语义信息,进而采用word2vec方法将轨迹数据转化为文档分析。提取包含位置信息和语义信息的轨迹时空词向量,在此基础上建立多分类模型对用户所属年龄段进行识别。实验结果表明,改进的TFT-IDFT方法在提取轨迹语义时更具合理性,且基于此方法构建的轨迹时空词向量应用于分类模型,对用户所属年龄阶段的识别效果更好。
        The trajectory data generated from users'mobile access to base stations reflect their life styles and behavior patterns in terms of both time and space.Based on the fact that temporal and spatial information are produced simultaneously,this paper proposes a TFT-IDFT method to extract semantic information from trajectories.First,a word embedding method named word2 vec is applied to build trajectory word vectors which include users'geometric and semantic information.Then,classification methods are used on these vectors to discriminate user age groups.The result shows that TFT-IDFT is more applicable than TF-IDF in the task of extracting semantic trajectories,and word vectors based on this method performs better in the age classification task.
引文
[1]Eckert P,Sally M.New generalizations and explanations in language and gender research[J].Language in Society,1999,28(2):185-202.
    [2]Koppel M,Argamon S,Shimoni A R.Automatically categorizing written texts by author gender[J].Literary and linguistic computing,2002,17(4):401-412.
    [3]Hu J,Zeng H J,Li H,et al.Demographic prediction based on user's browsing behavior[C]//Proceedings of the 16th international conference on World Wide Web.ACM,2007:151-160.
    [4]Lorigo L,Pan B,Hembrooke H,et al.The influence of task and gender on search and evaluation behavior using Google[J].Information Processing &Management,2006,42(4):1123-1131.
    [5]Bi B,Shokouhi M,Kosinski M,et al.Inferring the demographics of search users:Social data meets search queries[C]//Proceedings of the 22nd international conference on World Wide Web.ACM,2013:131-140.
    [6]王晶晶,李寿山,黄磊.中文微博用户性别分类方法研究[J].中文信息学报,2014,28(6):150-155.
    [7]Ying J J C,Chang Y J,Huang C M,et al.Demographic prediction based on users mobile behaviors[J].Mobile Data Challenge,2012:1-6.
    [8]Brdar S,Culibrk D,Crnojevic V.Demographic attributes prediction on the real-world mobile data[C]//Proceedings of Mobile Data Challenge by Nokia Workshop,in Conjunction with International Conference on Pervasive Computing,Newcastle,UK.2012.
    [9]Riederer C J,Zimmeck S,Phanord C,et al.I don't have a photograph,but you can have my footprints:Revealing the Demographics of Location Data[C]//Proceedings of the 2015ACM on Conference on Online Social Networks.ACM,2015:185-195.
    [10]李敏,王晓聪,张军,等.基于位置的社交网络用户签到及相关行为研究[J].计算机科学,2013,40(10):72-76.
    [11]陈元娟,严建峰,刘晓升,等.基于时空数据分类的用户社交联系学习[J].计算机应用研究,2017,34(5):1415-1418.
    [12]李源昊,陆平,吴一凡,等.面向移动社会网络的用户年龄与性别特征识别[J].计算机应用,2016,36(2):364-371.
    [13]Jing Y,Carsten E.Unsupervised Learning of Parsimonious General-Purpose Embeddings for User and Location Modeling[C]//Proceedings of ACM Transation Information System.2018,36(3):1-33.
    [14]Xue A Y,Zhang R,Zheng Y,et al.Destination prediction by sub-trajectory synthesis and privacy protection against such prediction[C]//Proceedings of the2013IEEE 29th International Conference on IEEE,2013:254-265.
    [15]Zheng K,Zheng Y,Yuan N J,et al.On discovery of gathering patterns from trajectories[C]//Proceedings of the 2013IEEE 29th International Conference on IEEE,2013:242-253.
    [16]Yuan N J,Zheng Y,Xie X,et al.Discovering urban functional zones using latent activity trajectories[J].IEEE Transaction on Knowledge and Data Engineering,2015,27(3):712-725.
    [17]Toole J L,Ulm M,González M C,et al.Inferring land use from mobile phone activity[C]//Proceedings of the ACM SIGKDD international workshop on urban computing.ACM,2012:1-8.
    [18]邱运芬,张晖,李波,等.一种基于位置语义和概率的人群分类方法[J].数据采集与处理,2018,33(3):538-546.
    [19]Al-Dohuki S,Wu Y,Kamw F,et al.SemanticTraj:A new approach to interacting with massive taxi trajectories[J].IEEE transactions on visualization and computer graphics,2017,23(1):11-20.
    [20]Feng S,Cong G,An B,et al.POI2Vec:Geographical Latent Representation for Predicting Future Visitors[C]//Proceedings of AAAI.2017:102-108.
    [21]Liu X,Liu Y,Li X.Exploring the Context of Locations for Personalized Location Recommendations[C]//Proceedings of IJCAI.2016:1188-1194.
    [22]Yu D,Liu Y,Yu X.A Data Grouping CNN Algorithm for Short-Term Traffic Flow Forecasting[C]//Proceedings of Asia Pacific Web Conference.Springer,Cham,2016:92-103.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700