面向专利技术主题分析的技术主题获取

英文篇名：Acquisition of Technical Theme for Patent Technical Theme Analysis
作者：侯婷 ; 吕学强 ; 李卓 ; 徐丽萍
英文作者：Hou Ting;
关键词：专利 ; 主题分析 ; 技术主题抽取 ; 相似度 ; 规范化
英文关键词：patent;;subject analysis;;technical theme extraction;;similarity;;standardization
中文刊名：QBLL
英文刊名：Information Studies:Theory & Application
机构：北京信息科技大学网络文化与数字传播北京市重点实验室;北京城市系统工程研究中心;
出版日期：2015-05-13 11:45
出版单位：情报理论与实践
年：2015
期：v.38;No.256
基金：国家自然科学基金项目“基于本体的专利自动标引研究”(项目编号:61271304);; 北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目“面向领域的互联网多模态信息精准搜索方法研究”(项目编号:kz201311232037)的成果之一
语种：中文;
页：QBLL201505025
页数：6
CN：05
ISSN：11-1762/G3
分类号：131-135+146

摘要

文章利用文本挖掘技术抽取技术主题和规范化主题,为技术主题分析提供基础工作。根据技术主题在专利标题中的分布特点和技术主题分析时主题词的统计长度特征,提出一种主题度计算方法,将主题度较大的词作为主题词;通过计算相似度获得主题词的同义词对,借助统计特征对主题词规范化表示。实验结果表明,文章提出的主题词抽取方法是有效的,实验准确率为95.5%,召回率为95.5%;同时文章提出的主题规范化方法具有较大的意义。
This paper uses text mining technology to extract technical theme and standardization theme,which provide basis for technical theme analysis. According to the distribution characteristic of technical theme in patent title and statistical length characteristic of keywords in technical theme analysis,the paper proposes a computing method of theme degree and treats the bigger value as the keywords. The paper obtains pairs of synonyms by similarity calculation and represents standardization of keywords through statistical features. The experimental result shows that the proposed keywords extraction method is effective; the accuracy of experiment is 95. 5% as well as the recall rate. In addition,the proposed theme standardization method has certain significance.

引文

[1]胡阿沛,张静,雷孝平,等.基于文本挖掘的专利技术主题分析研究综述[J].情报杂志,2013(12).
    [2]JUN S.Central technology forecasting using social network analysis[M]∥Computer Applications for Software Engineering,Disaster Recovery,and Business Continuity.Springer Berlin Heidelberg,2012:1-8.
    [3]陈达仁,王俊杰,周永铭.由中国专利探讨TFT-LCD专利表现及主要公司技术布局[J].图书情报知识,2006(6):96-104.
    [4]徐河杭,顾新建,陈国海,等.基于中文分词的专利挖掘分析方法研究[J].科研管理,2011,32(7):138-142.
    [5]沈君.知识网络视角的专利技术主题结构分析[D].大连:大连理工大学,2012.
    [6]王裴岩,张桂平,蔡东风,等.一种用于专利主题词抽取的模板自动生成方法[J].沈阳航空工业学院学报,2010,27(3):46-49.
    [7]王凌燕,方曙,季培培.利用专利文献识别新兴技术主题的技术框架研究[J].图书情报工作,2011(18):74-78.
    [8]HINTON,GEOFFREY E.Learning distributed representations of concepts[C]//Proceedings of the Eighth Annual Conference of the Cognitive Science Society.1986.
    [9]KNOKE D,BURKE P J.Log-linear Models[M].Sage Publications,Inc.,New Jer Sey,1980.
    [10]MNIH A,HINTON G.Three new graphical models for statistical language modelling[C]∥Proceedings of the 24th International Conference on Machine learning.ACM,2007:641-648.
    [11]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].[2013-09-07].http://arxiv.org/abs/1301/3781.
    [12]施春宏.语言规范化的基本原则及策略[J].汉语学报,2009(2):2-17.
    [13]NLPIR汉语分词系统[EB/OL].[2013-11-15].http://ictclas.nlpir.org/downloads.
    [14]DAO T,KELLER S,BEJNOOD A.Alternate Equivalent Substitutes:Recognition of Synonyms Using Word Vectors[D].U.S.:Stanford University,2013.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700