基于用户相关性的动态网络媒体数据无监督特征选择算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Unsupervised Feature Selection Algorithm for Dynamic Network Media Data Based on User Correlation
  • 作者:任永功 ; 王玉玲 ; 刘洋 ; 张晶
  • 英文作者:REN Yong-Gong;WANG Yu-Ling;LIU Yang;ZHANG Jing;Department of Computer and Information Technology,Liaoning Normal University;
  • 关键词:动态网络媒体数据 ; 无监督特征选择 ; 相关性 ; 梯度下降法 ; 关系强弱
  • 英文关键词:dynamic network media data;;unsupervised feature selection;;correlation;;gradient descent;;tie strength
  • 中文刊名:JSJX
  • 英文刊名:Chinese Journal of Computers
  • 机构:辽宁师范大学计算机与信息技术学院;
  • 出版日期:2018-01-22 11:51
  • 出版单位:计算机学报
  • 年:2018
  • 期:v.41;No.427
  • 基金:国家自然科学基金项目(61373127);; 辽宁省高等学校优秀人才支持计划项目(LR2015033);; 辽宁省科技计划项目(2013405003);; 大连市科技计划项目(2013A16GX116);; 辽宁省博士启动基金项目(20170520207)资助~~
  • 语种:中文;
  • 页:JSJX201807006
  • 页数:19
  • CN:07
  • ISSN:11-1826/TP
  • 分类号:89-107
摘要
移动互联网、社交媒体的快速发展,极大推动了各个领域对文本、图像、视频等网络媒体数据处理的需求.该类数据具有高维度、动态更新、内容复杂的特性,增加了特征计算以及分类难度.同时,当前网络媒体数据的特征选择方法主要针对静态数据,并且对数据格式规范性要求较高.针对上述问题,为保证对动态网络媒体数据的实时特征提取,该文提出了一种基于用户相关性的动态网络媒体数据无监督特征选择算法(Unsupervised Feature Selection Algorithm for Dynamic Network Media Based on User Correlation,UFSDUC).首先,对社交网络中的交互用户进行关系分析,作为无监督特征选择的约束条件.然后,利用拉普拉斯算子构建用户相关性的特征选择模型,量化相关用户之间的关系强弱,通过拉格朗日乘子法给出特征模型中最优用户关系的数学方法.最后,基于梯度下降法设定动态网络媒体数据的阈值,用以计算非零特征权值来更新最优特征子集,达到对网络媒体数据进行有效分类的目的.该算法可在保证用户在相关性完整的基础上对动态网络媒体数据进行准确、实时的特征选择.该文采用3个标准网络媒体数据集,同时与5种目前较为流行的同类型算法进行对比以验证算法的有效性.
        With the rapid development of the mobile network and social media,more and more Internet multi-media data including texture,image,video and others produce continuously at all times,meanwhile,requirements that learn and apply such data have growth.However,feature calculation and classification efficiency are severely limited,because of the high-dimensional,the complex content and dynamic updating characteristics of Internet multi-media data.Moreover,traditional algorithms mainly solve the feature extraction and classification problem for static multi-media data,and these algorithms require that data format need to conform the specific standard.Aiming to above problems,we proposed an efficient unsupervised feature selection algorithm based on user correlation that is called by UFSDUC(Unsupervised Feature Selection Algorithm for Dynamic Network Media Based on User Correlation)to ensure the feature extraction in real time for the dynamic multi-media data.Firstly,we analyzed user relationships in social networks,and combine the potential social factor to abstract three kinds of relational modelsincluding MFS(Multi-user Follow Same user),SFM(Same user Follow Multi-user),FEO(Follow Each Other).Take such models as the constraint condition for the unsupervised feature selection processing.Secondly,we use Laplace operator with the strength of relationship between users to building the relationship model,and then the lagrangian multiplier method is utilized to obtain the mathematical expression of the optimal relationship in the feature model.Moreover,in the proposed algorithm quantifies the strength of between users,which the more strength of the correlation may be gets the more similar information of the feature of between users.Therefore,our algorithm achieved the optimum solution for the multi-media data of the social network.Finally,we set the threshold of the multi-media data of the social network by utilizing the gradient descent method.This threshold is used to obtain the nonzero feature value,and then update the best subset of features to achieve the efficient performance to classify the multi-media data of the social network.In this paper,contributions of the proposed algorithm can be summarized as follows:(1)different traditional feature select algorithms that each sample need get the classification label,the proposed unsupervised feature selection algorithm can define the feature relationship according to different standards without labeling samples,for instance,the similarity of between samples and the distribution of the local information;(2)the correlative information of users is more stable than the self-users of information,such as the circle of friends once established will stably live in Internet always.Therefore,the proposed method can provides the important constraint condition for the feature extraction of the multi-media data by utilizing the user relevance;(3)the proposed algorithm realizes the feature selection efficiently at real time when the complete user relevance as a precondition.In this paper,we utilize three stander multi-media datasets to verify the proposed algorithm including Sina Weibo dataset,Flicker dataset,Blog Catalog dataset from‘Datatang'.These datasets have many characteristic enhancing the difficult of the feature extraction,such as amount of users,the complex relationship of between users,various categories of users.Moreover,we compare with five popular algorithms to evaluate the performance.
引文
[1]Gu Quan-Quan,Li Zhen-Hui,Han Jia-Wei.Generalized fisher score for feature selection//Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence.Catalina Island,USA,2010:266-273
    [2]Peng Han-Chuan,Long Fu-Hui,Ding Chris.Feature selection based on mutual information criteria of max-dependency,max-relevance,and min-redundancy.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226-1238
    [3]Nie Fei,Huang Heng,Cai X.Efficient and robust feature selection via joint2,1-norms minimization//Proceedings of the 26th International Conference on Data Engineering.Chicago,USA,2010:1813-1821
    [4]Deng Cai,Zheng Chi-Yuan,He Xiao-Fei.Unsupervised feature selection for multi-cluster data//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Columbia,USA,2010:333-343
    [5]Lui Huan,Motoda H.Computational Methods of Feature Selection.London,UK:CRC PRESS,2008
    [6]Nie Fei-Ping,Xiang Shi-Ming.Trace ratio criterion for feature selection//Proceedings of the 23rd AAAI Conference on Artificial Intelligence.Chicago,USA,2008:671-676
    [7]Robnik-Sikonja M,Kononenko I.Theoretical and empirical analysis of relieff and rrelieff.Machine Learning,2003,53(1):23-69
    [8]Dy J G,Brodley C E.Unsupervised feature selection applied to content-based retrieval of lung images.IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(3):373-378
    [9]He Xiao-Fei,Cai Deng,Niyogi Partha.Laplacian score for feature selection//Proceedings of the Advances in Neural Information Processing Systems.Columbia,Canada,2006:507-514
    [10]Zhao Zheng,Liu Huan.Spectral feature selection for supervised and unsupervised learning//Proceedings of the24th International Conference on Machine Learning.Corvallis,USA,2007:1151-1157
    [11]Li Ze-Chao,Yang Yi.Unsupervised feature selection using nonnegative spectral analysis//Proceedings of the 26th AAAI Conference on Artificial Intelligence.Toronto,Canada,2012:1026-1032
    [12]Li Ze-Chao,Liu Jing,Yang Yi,Zhou Xiao-Fang.Clusteringguided sparse structural learning for unsupervised feature selection.IEEE Transactions on Knowledge and Data Engineering,2014,26(9):2138-2150
    [13]Li Ze-Chao,Tang J H.Unsupervised feature selection via nonnegative spectral analysis and redundancy control.IEEE Transactions on Image Processing,2015,24(12):5343-5355
    [14]Argyriou A,Evgeniou T,Massimiliano Pontil.Multi-task feature learning//Proceedings of the Neural Information Processing System.Cambridge,UK,2007:41-48
    [15]Liu Jun,Ji Shui-Wang,Ye Jie-Ping.Multi-task feature learning via efficient l2,1-norm minimization//Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence.Montreal,Canada,2009:339-348
    [16]Zhao Zheng,Wang Lei.Efficient spectral feature selection with minimum redundancy//Proceedings of the 24th AAAI Conference on Artificial Intelligence.Georgia,USA,2010:1-6
    [17]Yang Yi,Shen Heng-Tao.L2,1-norm regularized discriminative feature selection for unsupervised learning//Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Barcelona,Spain,2011:1589-1594
    [18]Tang Ji-Liang,Liu Huan.Feature selection with linked data in social media.SDM,2012,16(2):118-128
    [19]Tang Ji-Liang,Liu H.Feature selection for social media data.ACM Transactions on Knowledge Discovery from Data,2014,8(4):19-46
    [20]Tang Ji-Liang,Liu Huan.Unsupervised feature selection for linked social media data//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery.Beijing,China,2012:904-912
    [21]Zhou Jing,Foster Dean.Streaming feature selection using alpha-investing//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Chicago,USA,2005:384-393
    [22]Wu Xin-Dong,Yu Kui,Wang Hao.Online streaming feature selection//Proceedings of the 27th International Conference on Machine Learning.Haifa,Israel,2010:1159-1166
    [23]Guo T,Zhu X Q.Snoc:Streaming network node classification//Proceedings of the IEEE International Conference on Data Mining.Shenzhen,China,2014:150-159
    [24]Tang Ji-Liang,Liu Huan.Unsupervised streaming feature selection in social media//Proceedings of the ACM International Conference on Information and Knowledge Management.Melbourne,Australia,2015:1041-1050
    [25]McPherson M,Lovin L S,Cook J M.Birds of a feather:Homophily in social networks.Annual Review of Sociology,2001,27(1):415-444
    [26]Marsden P V,Friedkin N E.Network studies of social influence.Sociological Methods and Research,1993,22(1):127-151
    [27]Morris S A.Manifestation of emerging specialties in journal literature:A growth model of papers,references,exemplars,bibliographic coupling,cocitation,and clustering coefficient distribution.Journal of the Association for Information Science and Technology,2005,56(12):1250-1273
    [28]Airoldi E M,Blei D M,Fienberg S E.Mixed membership stochastic blockmodels.Machine Learning Research,2008,12(6):33-40
    [29]Gopalan P,Gerrish S M.Scalable inference of overlapping communities.Advances in Neural Information Processing Systems,2012,3(21):2249-2257
    [30]Tang Ji-Liang,Wang Xu-Fei,Liu Huan.Integrating social media data for community detection//Proceedings of the International Conference on Modeling and Mining Ubiquitous Social Media.Boston,USA,2011:1-20
    [31]Macskassy S A,Provost F.Classification in networked data:a toolkit and a univariate case study.Machine Learning Research,2007,8(3):935-983
    [32]Gao Hui-Ji,Tang Ji-Liang,Liu Huan.Exploring socialhistorical ties on location-based social networks.Association for the Advancement of Artificial Intelligence,2012,5(12):104-115
    [33]Tang Ji-Liang,Gao Hu-Ji,Liu Huan.mTrust:Discerning multi-faceted trust in a connected world//Proceedings of the ACM International Conference on Web Search and Data Mining.Washington,USA,2012:93-102
    [34]Xiang,Neville J,Rogati M.Modeling relationship strength in online social networks//Proceedings of the International Conference on World Wide Web.North Carolina,USA,2010:981-990
    [35]Perkins S,Lacker K,Theiler J.Grafting:Fast,incremental feature selection by gradient descent in function space.Machine Learning Research,2003,3(3):1333-1356
    [36]Boyd S,Vandenberghe L.Convex optimization.IEEE Transactions on Automatic Control,2006,51(11):1859

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700