摘要
文章利用文本挖掘技术抽取技术主题和规范化主题,为技术主题分析提供基础工作。根据技术主题在专利标题中的分布特点和技术主题分析时主题词的统计长度特征,提出一种主题度计算方法,将主题度较大的词作为主题词;通过计算相似度获得主题词的同义词对,借助统计特征对主题词规范化表示。实验结果表明,文章提出的主题词抽取方法是有效的,实验准确率为95.5%,召回率为95.5%;同时文章提出的主题规范化方法具有较大的意义。
This paper uses text mining technology to extract technical theme and standardization theme,which provide basis for technical theme analysis. According to the distribution characteristic of technical theme in patent title and statistical length characteristic of keywords in technical theme analysis,the paper proposes a computing method of theme degree and treats the bigger value as the keywords. The paper obtains pairs of synonyms by similarity calculation and represents standardization of keywords through statistical features. The experimental result shows that the proposed keywords extraction method is effective; the accuracy of experiment is 95. 5% as well as the recall rate. In addition,the proposed theme standardization method has certain significance.
引文
[1]胡阿沛,张静,雷孝平,等.基于文本挖掘的专利技术主题分析研究综述[J].情报杂志,2013(12).
[2]JUN S.Central technology forecasting using social network analysis[M]∥Computer Applications for Software Engineering,Disaster Recovery,and Business Continuity.Springer Berlin Heidelberg,2012:1-8.
[3]陈达仁,王俊杰,周永铭.由中国专利探讨TFT-LCD专利表现及主要公司技术布局[J].图书情报知识,2006(6):96-104.
[4]徐河杭,顾新建,陈国海,等.基于中文分词的专利挖掘分析方法研究[J].科研管理,2011,32(7):138-142.
[5]沈君.知识网络视角的专利技术主题结构分析[D].大连:大连理工大学,2012.
[6]王裴岩,张桂平,蔡东风,等.一种用于专利主题词抽取的模板自动生成方法[J].沈阳航空工业学院学报,2010,27(3):46-49.
[7]王凌燕,方曙,季培培.利用专利文献识别新兴技术主题的技术框架研究[J].图书情报工作,2011(18):74-78.
[8]HINTON,GEOFFREY E.Learning distributed representations of concepts[C]//Proceedings of the Eighth Annual Conference of the Cognitive Science Society.1986.
[9]KNOKE D,BURKE P J.Log-linear Models[M].Sage Publications,Inc.,New Jer Sey,1980.
[10]MNIH A,HINTON G.Three new graphical models for statistical language modelling[C]∥Proceedings of the 24th International Conference on Machine learning.ACM,2007:641-648.
[11]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].[2013-09-07].http://arxiv.org/abs/1301/3781.
[12]施春宏.语言规范化的基本原则及策略[J].汉语学报,2009(2):2-17.
[13]NLPIR汉语分词系统[EB/OL].[2013-11-15].http://ictclas.nlpir.org/downloads.
[14]DAO T,KELLER S,BEJNOOD A.Alternate Equivalent Substitutes:Recognition of Synonyms Using Word Vectors[D].U.S.:Stanford University,2013.