融入术语知识的专利主题发现方法

英文篇名：Patent Topic Discovery Method Integrated with Term Knowledge
作者：俞琰 ; 赵乃瑄
英文作者：Yu Yan;Zhao Naixuan;Information Service Department,Nanjing Tech University;Computer Science Department,Southeast University Chengxian College;
关键词：专利分析 ; 主题发现 ; 术语
英文关键词：patent analysis;;topic discovery;;term
中文刊名：TSQB
英文刊名：Library and Information Service
机构：南京工业大学信息服务部;东南大学成贤学院电子与计算机学院;
出版日期：2018-11-05
出版单位：图书情报工作
年：2018
期：v.62;No.610
基金：教育部人文社会科学规划项目“大数据时代技能知识图谱构建研究”(项目编号:16YJAZH073);; 国家社会科学基金一般规划项目“大数据时代支持创新设计的多维度多层次专利文本挖掘研究”(项目编号:17BTQ059)研究成果之一
语种：中文;
页：TSQB201821024
页数：9
CN：21
ISSN：11-1541/G2
分类号：119-127

摘要

[目的 /意义]针对专利主题分析中以词为基本单位会造成专利中的多词术语难以被识别、主题模型结果不佳的问题,提出融入术语的专利主题发现模型,以解决该问题。[方法 /过程]模型首先引入类别熵,有效地识别出专利文献中的术语;然后利用泛化波利亚瓮模型增加语义相似术语分配到同一主题的概率,以缓解术语作为基本主题模型分析单位所带来的数据稀疏性问题。[结果 /结论]实验结果表明本文提出的模型包含的术语信息提高了主题生成的质量,使主题表示具有更强的可读性和主题判别性。
[Purpose/significance] Aiming at the problem of analysis patent topic in terms of word which causes topics are difficult to explain in the patent topic analysis,this paper proposes a patent topic discovery model integrated with term knowledge. [Method/process]The proposed model firstly introduces the class entropy and effectively recognizes the terms in the patent literature. Then,the Generalized Pólya Urn model is used to increase the probability of the semantic similarity terms assigned to the same topic,in order to alleviate the data sparsity problem brought by the term as the basic topic model analysis unit. [Result/conclusion]The experimental results show that the proposed model contains the term information to improve the quality of the topic generation,making the topic representation more readable and topic discriminative.

引文

[1]TANG J,WANG B,YANG Y,et al.Patent Miner:topic-driven patent analysis and mining[C]//ACM SIGKDD international conference on knowledge discovery and data mining.New York:ACM,2012:1366-1374.
    [2]WANG B,LIU S,DING K,et al.Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis:a case study in LTE technology[J].Scientometrics,2014,101(1):685-704.
    [3]CHEN H,ZHANG G,LU J,et al.A fuzzy approach for measuring development of topics in patents using Latent Dirichlet Allocation[C]//IEEE international conference on fuzzy systems.Piscataway,NJ:IEEE,2015:1116-1116.
    [4]KIM M,PARK Y,YOON J.Generating patent development maps for technology monitoring using semantic patent-topic analysis[J].Computers&industrial engineering,2016,98(1):289-299.
    [5]SUOMINEN A,TOIVANEN H,SEPPANEN M.Firms'knowledge profiles:mapping patent data with unsupervised learning[J].Technological forecasting&social change,2016,115(1):1-12.
    [6]范宇,符红光,文奕.基于LDA模型的专利信息聚类技术[J].计算机应用,2013,33(S1):87-89.
    [7]王博,刘盛博,丁堃,等.基于LDA主题模型的专利内容分析方法[J].科研管理,2015,36(3):111-117.
    [8]吴菲菲,张亚茹,黄鲁成,等.基于ATo T模型的技术主题多维动态演化分析---以石墨烯技术为例[J].图书情报工作,2017,1(5):95-102.
    [9]廖列法,勒孚刚.基于LDA模型和分类号的专利技术演化研究[J].现代情报,2017,37(5):13-18.
    [10]陈亮,张静,张海超,等.层次主题模型在技术演化分析上的应用研究[J].图书情报工作,2017,1(5):103-108.
    [11]WALLACH H M.Topic modeling:beyond bag-of-words[C]//International conference on machine learning.New York:ACM,2006:977-984.
    [12]WANG X,MCCALLUM A,WEI X.Topical N-grams:phrase and topic discovery,with an application to information retrieval[C]//IEEE international conference on data mining.Piscataway,NJ:IEEE,2007:697-702.
    [13]LINDSEY R V,Headden III W P,STIPICEVIC M J.A phrasediscovering topic model using hierarchical Pitman-Yor processes[C]//Joint conference on empirical methods in natural language processing and computational natural language learning.Stroudsburg,PA:ACL,2012:214-222.
    [14]DANILEVSKY M,WANG C,DESAI N,et al.Automatic construction and ranking of topical keyphrases on collections of short documents[C]//Proceedings of the 2014 SIAM international conference on data mining.Philadelphia,PA:SIAM,2014:398-406.
    [15]El-KISHKY A,SONG Y,VOSS C R,et al.Scalable topical phrase mining from text corpora[J].Proceedings of the VLDB endowment,2014,8(3):305-316.
    [16]张琴,张智雄.基于Phrase LDA模型的主题短语挖掘方法研究[J].图书情报工作,2017,61(8):120-125.
    [17]HEINRICH G.A generic approach to topic model[M]//Machine learning knowledge discovery in databases.Berlin:Springer,2009:517-532.
    [18]ZIPF G K.Selected studies of the principle of relative frequency in language[J].Language,1933,9(1):89-92.
    [19]韩红旗,朱东华,汪雪锋.专利技术术语的抽取方法[J].情报学报,2011,30(12):1280-1285.
    [20]徐川,施水才,房祥,等.中文专利文献术语抽取[J].计算机工程与设计,2013,34(6):2175-2179.
    [21]FRANTZI K,ANANIADOU S,MIMA H.Automatic recognition of multi-word terms:.the C-value/NC-value,method[J].International journal on digital libraries,2000,3(2):115-130.
    [22]SPASIC I,GREENWOOD M,PREECE A,et al.Flexi Term:a flexible term recognition method[J].Journal of biomedical semantics,2013,4(1):27-42.
    [23]MAYNARD D,ANANIADOU S.Identifying terms by their family and friends[C]//Conference on computational linguistics.Stroudsburg,PA:ACL,2000:530-536.
    [24]李超,王会珍,朱慕华,等.基于领域类别信息C-value的多词串自动抽取[J].中文信息学报,2010,24(1):94-99.
    [25]刘里,刘小明.基于分隔符和上下文术语的领域现象术语抽取[J].华南理工大学学报(自然科学版),2011,39(7):146-149.
    [26]胡阿沛,张静,刘俊丽.基于改进C-value方法的中文术语抽取[J].现代图书情报技术,2013,29(2):24-29.
    [27]张杰,张海超,翟东升.面向中文专利权利要求书的分词方法研究[J].现代图书情报技术,2014,30(9):91-98.
    [28]MAHMOUD H.Polya urn models[M].New York:Champman&Hall/CRC,2009.
    [29]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of machine learning research,2003,3(1):993-1022.
    [30]GRIFFITHS T L,STEYVERS M.Finding scientific topics[J]//Proceedings of the national academy of Science,2004,1(1):5228-5235.
    [31]MIMNO D,WALLACH H M,TALLEY E,et al.Optimizing semantic coherence in topic models[C]//Proceedings of the conference on empirical methods in natural language processing.Stroudsburg,PA:ACL,2011:262-272.
    [32]CHEN Z,MUKHERJEE A,LIU B,et al.Leveraging multi-domain prior knowledge in topic models[C]//International joint conference on artificial intelligence.Menlo Park,CA:AAAI,2013:2071-2077.
    [33]CHEN Z,MUKHERJEE A,LIU B,et al.Discovering coherent topics using general knowledge[C]//ACM international conference on information&knowledge management.New York:ACM,2013:209-218.
    [34]CHEN Z,LIU B.Mining topics in documents:standing on the shoulders of big data[C]//ACM SIGKDD international conference on knowledge discovery and data mining.New York:ACM,2014:1116-1125.
    [35]孙锐,郭晟,姬东鸿.融入事件知识的主题表示方法[J].计算机学报,2017,40(4):791-804.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700