供需匹配视角下基于语义相似聚类的技术需求识别模型
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Technology demand recognition model based on semantic similarity under the supply-demand matching perspective
  • 作者:何喜军 ; 张婷婷 ; 武玉英 ; 蒋国瑞
  • 英文作者:HE Xijun;ZHANG Tingting;WU Yuying;JIANG Guorui;School of Economics and Management, Beijing University of Technology;
  • 关键词:技术供需匹配度 ; 技术需求识别 ; 词向量语义相似度
  • 英文关键词:technology supply and demand matching degree;;technical demand recognition;;word vector semantics similarity
  • 中文刊名:XTLL
  • 英文刊名:Systems Engineering-Theory & Practice
  • 机构:北京工业大学经济与管理学院;
  • 出版日期:2019-02-25
  • 出版单位:系统工程理论与实践
  • 年:2019
  • 期:v.39
  • 基金:北京市自然科学基金(9172002)~~
  • 语种:中文;
  • 页:XTLL201902018
  • 页数:10
  • CN:02
  • ISSN:11-2267/N
  • 分类号:206-215
摘要
在技术供需文本匹配视角下,提出了一种基于语义相似聚类的技术需求识别模型.首先,采集网络中技术需求文本提取关键短语;然后,建立领域专利技术转让索引库,基于需求关键短语检索出高相关专利,构建专利技术供给背景库,并对背景库中专利标题与摘要进行分词;第三,提出基于词向量的供需文本语义匹配度算法,筛选有效技术需求并进行语义相似聚类;最后,考虑技术需求对应的需求量和专利技术转让量,对聚类结果进行二维分类.以新能源领域为例进行实证,识别出有效技术需求195个,基于语义相似聚成12类,结合需求量与专利转让量,将12类技术需求分为"高需求、高转让"、"高需求、低转让"、"低需求、高转让"、"低需求、低转让"四大类.该研究为网络技术需求挖掘及供需匹配提供一种新思路.
        This paper proposes a technology demand recognition model based on semantic similarity and patent transaction in the view of technology supply and demand text matching. Firstly, we collect the network technology demand text and extract the key phrases. On this basis, we build the patent transaction index library, retrieve the high-related patents based on the key phrases, build background library of patent technology supply and cut the patent title and abstract those in the background library.Thirdly, we propose the method of semantic matching weight calculation of technical supply and demand text based on word vector, filter the effective technology demand and classification. Finally, we classify the clustering results in two dimensions based on the amount of demand and the corresponding amount of patent transaction. By selecting the new energy technology field as an example to test the model, it is found that there are 195 effective technology demands, they are aggregated into 12 categories based on semantic similar. Combined with the amount of demand and the corresponding amount of patent transaction, the 12 categories of technology demand are divided into four categories: "high demand,high transaction","high demand,low transaction","low demand, low transaction","low demand, low transaction". The research provides a new idea for mining network technology demand and matching the demand and supply.
引文
[1]汤胤,欧治花,陈杏惠,等·兴趣社交网络中的供需匹配研究:以豆瓣网为例[J]·管理工程学报,201,5, 29(2):41-50.Tang Y, Ou Z H, Chen X H, et al. Supply-demand matching in online interest-based social inetworks:A douban.com case[J]. Journal of Industrial Engineering Management, 2015, 29(2):41-50.
    [2]胡健,营会芳,孙金花.基于模糊案例推理的浅隐性知识供需匹配研究[J].情报理论与实践,2016,39(4):84 88.Hu J, Jian H F, Sun J H. Research on matching between supply and demand of tacit knowledge based on fuzzy case-based reasoning[J]. Information Studies:Theory&Application, 2016, 39(4):84-88.
    [3]纪蔚蔚.基于词频分析的我国2004年科学学发展动向研究[J].科研管理,2006, 27(3):81-89.Ji W W. The dewlovelIopent trend research of science subject based on the analysis of word frequency in China2004[J]. Science Research Management., 2006, 27(3):81-89.
    [4]熊则见,杨敏,赵雯.高技术产品研发关键成功因素的文献计量分析[J].科研管理,2011, 32(10):36-45.Xiong Z J, Yang M, Zhao W. A bibliometric:analysis on critical success factors in high-tech product's R&D[,J].Science Research Management, 2011, 32(10):36-45.
    [5] Kuncoro B A, Iswanto B H. TF-IDF method in ranking keywoirds of Instagram iusers'image capt.ions[C]//International Conference on Information Technology Syst.ems arnd Innovation, IEEE, 2015:15.
    [6] Zheng Y, Meng Z P, Xu C. A short-text oriented clustering method for hot topics extraction(Jl. International Journal of Software Engineering&Knowledge Engineering, 2015, 25(3):453-471.
    [7] He G, Wang J, Zhang Y, et al. Keyword extraction of web pages based on domain thesaurus[C]//International Conference on Cloud Computing and Intelligence Systems, IEEE, 2015:3l10-314.
    [8] Kim H G, Lee S, Kyeong S. Discovering hot topics using Twitter streaming data social topic detection and geographic clustering[C)//IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE, 2013:1215-1220.
    [9]邓小龙,李欲晓,面向应急管理的大图重要节点中介度高效近似计算方法[Jl.系统工程理论与实践,2015, 35(10):2531-2543.Deng X L, Li Y X. Efficient node betweenness approximation computation method for large graph in emergency management[JJ. Systems Engineering—Theory&Practice, 2015, 35(10):2531 2543.
    [10] Mihalcea R, Tarau P. TextRank:Bringing order into texts[R]. UNT Scholarly Works, 2004:404-411.
    [11]谢玮,沈一,马永征.基于图计算的论文审稿自动推荐系统[J].计算机应用研究,2016, 33(3):798 801.Xie W, Shen Y, Ma Y Z. Recommendation system for paper reviewing based on graph computing[JJ. Application Research of Computers, 2016, 33(3):798. 801.
    [12] Rahman M M, Roy C K. TextRank based search term identification for software change tasks[C]//IEEE,International Conference on Software Analysis, Evolution nud Reengineering, IEEE Computer Society, 2015:540 544.
    [13]方俊,郭雷,王晓东.基于语义的关键词提取算法IJ].计算机科学,2008, 35(6):148-151.Fang J, Guo L, Wang X D. Semantically improved automatic key phrase extraction[J]. Computer Science, 2008,35(6):148 151.
    [14] Gang L, Dai Q, Quan W. A new approach to compute semantic relevance of Chinese words[C]//International Conference on Artificial Intelligence and Education, IEEE, 2010:610-613.
    [15] Li X, Wu X, Hu X, et al. Keyword extraction based on lexical chains and word co-occurrence for Chinese news web pages[C]//IEEE International Conference on Data Mining Workshops, IEEE Computer Society, 2008:744-751.
    [16]王立反,淮晓永.基于语义的中文文本关键词提取算法[J].计算机工程,2012, 38(1):1-4.Wang L X, Huai X Y. Semantic-based keyword extraction algorithm for Chinese text[J]. Computer Engineering,2012, 38(1):1 4.
    [17]刘端阳,王良芳.结合语义扩展度和词汇链的关键词提取算法[J].计箅机科学,2013, 40(12):264. 269.Liu D Y, Wang Y F. Extraction algorithm based on semantic expansion integrated with lexical chain[J]. Computer Science, 2013, 40(12):264 269.
    [18]徐雅斌,李卓,吕非非,等·基于频繁词集聚类的微博新话题快速发现[J].系统工程理论与实践,2014, 34(s1):276-282.Xu Y B, Li Z, L(u|¨)F F, et al. Rapid discovery of new topics in microblogs based on frequent words sets clustering(JJ.Systems Engineering—Theory Practice, 2014, 34(s1):276-282.
    [19] Wei T, Lu Y, Chang H, et al. A semantic approach for text clustering using WordNet and lexical chains[J].Expert Systems with Applications, 2015, 42(4):2264-2275.
    [20]姜芳,李国和,岳翔.基于语义的文档关键词提取方法[J].计算机应用研究,2015, 32(1):142-145.Jiang F, Li G H, Yue X. Semantic-based keyword extraction method for document[J]. Application Research of Computers, 2015, 32(1):142-145.
    [21]于娟,党延忠.本体关系学习方法研究概念特征词法[J].系统工程理论与实践,2012, 32(7):1582 1590.Yu J, Dang Y Z. Learning ontology relations from documents:The concept-feature method[J]. Systems Engineering-Theory Practice, 2012, 32(7):1582 1590.
    [22]李跃鹏,金翠,及俊川.基于Word2Vec的关键词提取算法[J].科研信息化技术与应用,2015(4):54-59.Li Y P, Jin C, Ji J C. A keyword extraction algorithm based on Word2vec[J]. E-science Technology&Application,2015(4):54 59.
    [23] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. Computer Science, arXiv preprint arXiv:1301.3781, 2013.
    [24]刘俊,邹东升,邢欣来,等.基于主题特征的关键词抽取[J].计算机应用研究,2012, 29(11):4224-4227.Liu J, Zou D S, Xing X L, et al. Keyphrase extraction based on topic feature[J]. Application Research of Computers, 2012, 29(11):4224-4227.
    [25] Xue B, Fu C, Zhan S. A study on sentiment computing and classification of Sina Weibo with Word2vec[C]//IEEE International Congress on Big Data, IEEE, 2014:358-363.
    [26] Nguyen N T H, Miwa M, Tsuruoka Y, et al. Identifying synonymy between relational phrases using word embeddings[M]. Elsevier Science, 2015.
    [27] Pennington J, Socher R, Manning C. Glove:Global vectors for word representation[C]//Conference on Empirical Methods in Natural Language Processing, 2014:1532-1543.
    [28] Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[J]. arXiv preprint arXiv:1607.01759, 2016.
    [29]宁建飞,刘降珍.融合Word2Vec与TextRank的关键词抽取研究[J].现代图书情报技术,2016(6):20-27.Ning J F, Liu J Z. Using Word2vec with TextRank to extract keywords[J]. New Technology of Library and Information Service, 2016(6):20-27.
    [30]夏天.词向敏聚类加权TextRank的关键词抽取[J].数据分析与知识发现,2017, 1(2):28-34.Xia T. Extracting keywords with modified TextRank model[J]. Data Analysis and Knowledge Discovery, 2017,1(2):28-34.
    [31] Brownlie D T, Macbeth D K. The strategic management of technology:Integrating technology supply and demand perspectives[J]. European Management Journal, 1989, 7(1):71-83.
    [32] Klerkx L, Leeuwis C. Matching demand and supply in the agricultural knowledge infrastructure:Experiences with innovation intermediaries[J]. Food Policy, 2008, 33(3):260-276.
    [33] Hung S H, Lin C H, Hong J S. Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling[J]. Expert Systems with Applications, 2010, 37(1):341-347.
    [34]张娴,胡正银,茹丽洁,等.技术供需信息关联知识组织模式研究[J].图书情报工作,2016, 60(8):118-125.Zhang X, Hiu Z Y, Riu. J, et al. Knowledge organization framework for the association of patent technology supply and demand information[J]. Library and Information Service, 2016, 60(8):118-125.
    [35]范晨熙,黄理灿,李雪利·基于Lucene的BM25模型的评分机制的研究[J].工业控制计算机,2013, 26(3):78-79.Fan C X, Huang L C, Li X L. Research on scoring mechanism of BM25 model based on Lucene[J]. Industrial Control Computer,2013, 26(3):78-79.
    [36] Bengio Y, Schwenk H, Senecal J, et al. Neural probabilistic language models[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.
    [37] Hartigan J A, Wong M A. Algorithm as 136:A k-means clustering algorithm[J]. Journal of the Royal Statistical Society, 1979, 28(1):100-108.
    [38]张素洁,赵怀慈.最优聚类个数和初始聚类中心点选取算法研究[J].计算机应用研究,2017, 34(6):1617-1620.Zhang S J, Zhao H C. Algorithm research of optimal cluster number and initial cluster center[J]. Application Research of Computers, 2017, 34(6):1617-1620.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700