基于论文标题的学科研究主题动力学分析
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Dynamical Analysis of Discipline Research Topics Based on Articles' Titles
  • 作者:刘海燕 ; 张志毅 ; 尹晓虎
  • 英文作者:LIU Hai-yan;ZHANG Zhi-yi;YIN Xiao-hu;Party School of the CPC Ji'nan Municipal Party committee;Army Support Department in Northern War Zone;Unit72465;
  • 关键词:思想政治教育研究 ; 主题动力学 ; 最小描述长度 ; 向量表示 ; 卷积神经网络
  • 英文关键词:research on ideological and political education;;topic dynamics;;the minimum description length;;vector representation;;convolution neural networ
  • 中文刊名:QBKX
  • 英文刊名:Information Science
  • 机构:中共济南市委党校基础教研部;北部战区陆军保障部;72465部队;
  • 出版日期:2019-04-01
  • 出版单位:情报科学
  • 年:2019
  • 期:v.37;No.332
  • 基金:国家社会科学基金军事学项目“部队思想政治教育的数理基础研究”(16GJ003-47)
  • 语种:中文;
  • 页:QBKX201904006
  • 页数:9
  • CN:04
  • ISSN:22-1264/G2
  • 分类号:38-45+138
摘要
【目的/意义】针对中文学术文献数字化资源不完备、信息数据项可用度低的现状,建立了面向论文标题的学科研究主题动力学建模框架,为开展科学计量、把握相关学科研究主题的演化脉络与发展趋势提供了分析手段。【方法/过程】该框架综合运用了自然语言处理、最小描述长度原理、单词向量表示、无监督聚类与卷积神经网络分类器等技术,解决了常规主题建模方法应用于论文标题时面临的分词精度不够、数据稀疏、主题归属难确定等问题,并以改革开放以来思想政治教育研究论文的标题大数据为例进行了演示计算。【结果/结论】实验计算,验证了方法框架的可行性,揭示了四十年来思想政治教育研究主题的分布和演进,为新时代思想政治教育创新发展提供了基点和靶标。
        【Purpose/significance】In view of the incomplete digitized resources and the low availability of information data items of Chinese academic literature, a dynamic topics modeling framework oriented to paper titles is established, which provides an analytical means for carrying out scientific metrology and grasping the evolutionary context and development trend of research topics in related disciplines.【Method/process】The framework integrates natural language processing, minimum description length principle, word vector representation, unsupervised clustering and convolution neural network classifier to solve the problems of insufficient precision of word segmentation, sparse data and difficult determination of subject attribution faced by conventional topic modeling methods when applied to the titles of papers. A demonstration is carried out, with the big data of the articles′ titles on ideological and political education since the reform and opening up.【Result/conclusion】The experimental calculation verifies the feasibility of our methodological framework, reveals the distribution and evolution of the research themes of ideological and political education in the past 40 years, and provides the basis and target for the innovative development of ideological and political education in the new era.
引文
1 Lieu Phuong Phuong. Influence of Title Characteristics in Scientific Literature on Tweeting Behaviour[D]. Turku:?bo Akademi University,2017.
    2 Charles W. Fox and C. Sean Burns. The Relationship between Manuscript Title Structure and Success:Editorial Decisions and Citation Performance for an Ecological Journal[J]. Ecology and Evolution, 2015, 5(10):1970-1980.
    3 Mohammad Reza Falahati Qadimi Fumani, Marzieh Goltaji and Pardis Parto. The Impact of Title Length and Punctuation Marks on Article Citations[J]. Annals of Library and Information Studies,2015, 62(15):126-132.
    4 Thomas L. Griffiths and Mark Steyvers.Finding Scientific Topics[J]. PNAS,2004, 101(1):5228-5235.
    5 Derek Greene and James P.Cross.Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach[J]. Political Analysis,2016, 25(1):77-94.
    6 Dela Rosa Kevin, Shah Rushin, Lin Bo, et al. Topical Clustering of Tweets[EB/OL]. http://www. cs. cmu.edu/~kdelaros/sigir-swsm-2011.pdf,2011-03-26.
    7 James O'Neill, Cecile Robin, Leona O’Brien, et al.An Analysis of Topic Modelling for Legislative Texts[C]//2nd Workshop on Automatic Semantic Analysis of Information in Legal Text, 2017, London:ACM,1-8.
    8 Kim Jautze, Andreas van Cranenburgh and Corina Koo len. Topic Modeling Literary Quality[C]//Digital Humanities, 2016, Poland:Krakow,233-237.
    9 Agazi Mekonnen and Shamsi Abdullayev. Topic Modeling and Clustering for Analysis of Road Traffic Accidents[D]. Gothenburg, Sweden:Chalmers University of Technology,2017.
    10 Thomas Reiss, Etienne Vignola-Gagné, Piret Kukk, et al.Emerging Research Areas and their Coverage by ERCsupported Projects[R]. Karlsruhe,Germany:Fraunhofer ISI,2013.
    11 Mark Steyvers and Tom Griffiths. Probabilistic Topic Models, in Latent Semantic Analysis:A Road to Meaning[M].T. Landauer, D McNamara, S. Dennis and W. Kintsch, Editors, Mahwah:Lawrence Erlbaum,2007.
    12 Thomas K Landauer, Peter W. Foltz and Darrell Laham.An Introduction to Latent Semantic Analysis[J]. Discourse Processes,1998, 25(2/3):259-284.
    13 David M. Blei, Andrew Y. Ng and Michael I. Jordan.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research,2003, 3(1):993-1022.
    14 Liu Qihua. Employing Latent Dirichlet Allocation Model for Topic Extraction of Chinese Text[J]. International Journal of Database Theory and Application,2016, 9(7):51-66.
    15 Tom Kenter and Maarten de Rijke. Short Text Similarity with Word Embeddings[C]//CIKM'15, Melbourne, Australia:ACM,2015:1411-1420.
    16 Tomas Mikolov, Kai Chen, Greg Corrado, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL].https://www.researchgate.net/publication/234131319_Efficient_Estimation_of_Word_Representations_in_Vecto r_Space, 2013-08-06.
    17 Ruey-Cheng Chen. An Improved MDL-Based Compression Algorithm for Unsupervised Word Segmentation[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria:Association for Computational Linguistics,2013:166–170.
    18 Ruey-Cheng Chen, Chiung-Min Tsai and Jieh Hsiang. A Regularized Compression Method To Unsupervised Word Segmentation[C]//Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology 2012, Montreal, Canada:Association for Computational Linguistics,2012:26-34.
    19 Peter D. Grunwald, In Jae Myung and Mark A. Pitt, eds.Advances in Minimum Description Length:Theory and Applications[M]. Cambridge, Massachusetts:The MIT Press,2005.
    20 Daniel Hewlett and Paul Cohen. Fully Unsupervised Word Segmentation with BVE and MDL[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon:Association for Computational Linguistics,2011:540–545.
    21 Valentin Zhikov and Hiroya Takamura. An Efficient Algorithm for Unsupervised Word Segmentation(下转第136页)with Branching Entropy and MDL[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010, MIT, Massachusetts:Association for Computational Linguistics,2010:832–842.
    22 Li Li. Model Selection via Minimum Description Length[D]. Toronto:University of Toronto,2011.
    23 Quoc Le and Tomas Mikolov. Distributed Representations of Sentences and Documents[C]//Proceedings of the 31st International Conference on Machine Learning, Beijing,China:JMLR(W&CP),2014:1188-1196
    24 Ahmed Elsafty. Document Similarity using Dense Vector Representation[D].Hamburg:Hamburg University,2017.
    25 Chen Ziyan, Huang Yu, Liang Yuexian, et al.RGloVe:An Improved Approach of Global Vectors for Distributional Entity Relation Representation[J]. Algorithms,2017, 10(2):42.
    26 Zhao Qinpei. Cluster Validity in Clustering Methods[D].Joensuu:University of Eastern Finland,2012.
    27 Zhao Qinpei, Hautamaki Ville and Fr?nti Pasi. Knee Point Detection in BIC for Detecting the Number of Clusters[C]//Advanced Concepts for Intelligent Vision Systems,Juan-les-Pins:Springer,2008:664–673.
    28 Chen Yahui. Convolutional Neural Network for Sentence Classification[D].Ontario, Canada:University of Waterloo,2015.
    29 Ye Zhang and Byron C. Wallace. A Sensitivity Analysis of(and Practitioners'Guide to)Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the The 8th International Joint Conference on Natural Language Processing, Taipei, Taiwan:Asian Federation of Natural Language Processing,2017:253–263.
    30 Che Wanxiang, Li Zhenghua and Liu Ting. LTP:A Chinese Language Technology Platform[C]//Proceedings of the Coling, 2010:13-16.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700