面向专利的主题短语提取

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

面向专利的主题短语提取

详细信息查看全文 | 推荐本文 |

英文篇名：Topical phrase mining for patent
作者：马建红 ; 姬帅 ; 刘硕
英文作者：MA Jian-hong;JI Shuai;LIU Shuo;School of Computer Science and Engineering,Hebei University of Technology;
关键词：专利挖掘 ; 短语抽取 ; 双向长短时记忆网络 ; 条件随机场 ; 主题模型
英文关键词：patent mining;;term extraction;;bidirectional long short-term memory;;conditional random fields;;topic model
中文刊名：SJSJ
英文刊名：Computer Engineering and Design
机构：河北工业大学计算机科学与软件学院;
出版日期：2019-05-15
出版单位：计算机工程与设计
年：2019
期：v.40;No.389
语种：中文;
页：SJSJ201905031
页数：6
CN：05
ISSN：11-1775/TP
分类号：173-177+190

摘要

在中文专利主题挖掘研究中,针对基于单词的传统主题模型结果可解释性较差问题,提出一种融合词向量和Generalized Pólya urn (GPU)的改进模型GW_PhraseLDA。根据专利文本特点,使用BLSTM-CRF模型进行专利短语抽取,利用训练好的词向量生成先验知识。在Gibbs采样的迭代过程中,利用GPU策略提升语义相关短语在同一主题下的概率。在中文专利文本上的实验结果表明,所提模型能够有效提高专利主题生成质量,相比传统的主题模型更具可解释性和判别性。
In the study of Chinese patent topic mining,an improved model GW_PhraseLDA,which combined word vector and Generalized Pólya urn(GPU),was proposed to solve the problem of poor interpretability of the result of the traditional topic model based on the word.According to the characteristics of the patent text,the BLSTM-CRF model was used to extract the patent phrases.The trained word vectors were used to generate prior knowledge.In the iterative process of Gibbs sampling,the GPU strategy was used to improve the probability of semantic related phrases under the same topic.Results of experiments on Chinese patent texts show that the model proposed can effectively improve the quality of patent topic,which is much more interpretable and discriminant than traditional topic models.

引文

[1]CHEN Liang.Patent Classification LDA:Topic model for patent analysis[J].Journal of the China Society for Scientific and Technical Information,2016,35(8):864-874(in Chinese).[陈亮.面向专利分析的Patent Classification LDA模型[J].情报学报,2016,35(8):864-874.]
    [2]WANG Liang,ZHANG Shaowu,DING Kun,et al.HDP-based vehicle patent topic evolution[J].Journal of the China Society for Scientific and Technical Information,2014,33(9):944-951(in Chinese).[王亮,张绍武,丁堃,等.基于HDP的汽车专利主题演化研究[J].情报学报,2014,33(9):944-951.]
    [3]WANG Bo,LIU Shengbo,DING Kun,et al.Patent content analysis method based on LDA topic model[J].Science Research Management,2015,36(3):111-117(in Chinese).[王博,刘盛博,丁堃,等.基于LDA主题模型的专利内容分析方法[J].科研管理,2015,36(3):111-117.]
    [4]HU Bo.Research on the key technologies in time-related sequential text mining[D].Beijing:Tsinghua University,2014(in Chinese).[胡珀.时间相关文本序列挖掘的关键技术研究[D].北京:北京清华大学,2014.]
    [5]SUN Rui,GUO Sheng,JI Donghong.Topic representation integrated with event knowledge[J].Chinese Journal of Computers,2017,40(4):791-804(in Chinese).[孙锐,郭晟,姬东鸿.融入事件知识的主题表示方法[J].计算机学报,2017,40(4):791-804.]
    [6]El-Kishky A,Song Y,Voss CR,et al.Scalable topical phrase mining from text corpora[J].Proceedings of the Vldb Endowment,2014,8(3):305-316.
    [7]Huang Z,Xu W,Yu K.Bidirectional LSTM-CRF models for sequence tagging[J].arXiv preprint arXiv:1508.01991,2015.
    [8]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.MITPress,2013:3111-3119.
    [9]Das R,Zaheer M,Dyer C.Gaussian LDA for topic models with word embeddings[C]//Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing.ACL,2015:795-804.
    [10]Xun G,Gopalakrishnan V,Ma F,et al.Topic discovery for short texts using word embeddings[C]//International Conference on Data Mining.IEEE,2017:1299-1304.
    [11]Chen Z,Liu B.Mining topics in documents:Standing on the shoulders of big data[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2014:1116-1125.
    [12]Chen Z,Liu B.Topic modeling using topics from many domains,lifelong learning and big data[C]//International Conference on International Conference on Machine Learning.JM-LR.org,2014.
    [13]Fei G,Chen Z,Liu B.Review topic discovery with phrases using the pólya urn model[C]//25th International Conference on Computational Linguistics.ACM,2014:667-676.
    [14]Li C,Wang H,Zhang Z,et al.Topic modeling for short texts with auxiliary word embeddings[C]//International Acm Sigir Conference on Research&Development in Information Retrieval.ACM,2016:165-174.
    [15]Mimno D,Wallach HM,Talley E,et al.Optimizing semantic coherence in topic models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.ACL,2011:262-272.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700