摘要
该文提出了一种基于复杂网络分析方法的小说人物关系识别模型。通过以金庸14部武侠小说的分析过程为样例,首先提出了基于小说社会网络关系的降噪分析框架,然后在此基础上构建了人物亲密度评估与关系判别模型,最后给出了一种识别小说主角复杂爱情模式的通用模型。实验发现该模型能够有效地分析出小说中的复杂爱情模式,且在保证识别效率的同时还具备较高的精准度。在模型训练时,设置了变尺度窗口,发现随着窗口的变小,模型识别的主角复杂爱情模式呈现出召回率会不断上升至稳定,同时精确率则会维持相对稳定至超过一个阈值后不断下降这一重要现象。该文提出的复杂爱情模式识别框架,不仅对长文本小说人物关系分析具有较好的借鉴意义,还可以应用于判断小说精彩性和小说内容个性化推荐的图书决策支持系统。
This paper proposes a fiction character relationship recognition model based on the complex network analysis method.Taking the Jin Yong's fourteen martial arts fictions as an example,a noise-reduction analysis framework on fiction social networks,a model of human intimacy assessment and relational discriminant are built,which construct a general model for identifying love relationships of the protagonists in a novel.Experiment results show that the proposed model bears high accuracy and efficiency.It is also revealed that a decreased sliding window would improve the recall rate without losing the accuracy before a certain threshold.
引文
[1]马创新,陈小荷.文献中的词语分布、词型等级和风格计算[J].中文信息学报,2017,31(4):20-27.
[2]肖天久,刘颖.基于聚类和分类的金庸与古龙小说风格分析[J].中文信息学报,2015,29(5):167-177.
[3]Van Dalenoskam K.Names in novels:An experiment in computational stylistics[J].Literary&Linguistic Computing,2013,28(2):359-370.
[4]Waugh S,Adams A,Tweedie F.Computational stylistics using artificial neural networks[J].Literary&Linguistic Computing,2000,15(2):187-198.
[5]石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248.
[6]熊丹,陆勤,罗凤珠,等.基于语料库的明清小说人名与称谓研究[J].中文信息学报,2015,29(1):19-27.
[7]罗云飞,李国臣.采用优先选择策略的中文人称代词的指代消解[J].中文信息学报,2007,19(4):24-30.
[8]熊皓,刘群,吕雅娟.联合语义角色标注和指代消解[J].中文信息学报,2013,27(6):58-68.
[9]郭喜跃,何婷婷,胡小华,等.基于句法语义特征的中文实体关系抽取[J].中文信息学报,2014,28(6):183-189.
[10]陈宇,郑德权,赵铁军.基于Deep Belief Nets的中文名实体关系抽取[J].软件学报,2012,23(10):2572-2585.
[11]Cafarella M J,Banko M,Etzioni O.Open information extraction from the web:WO,US8938410[P].2015.
[12]Toutanova K,Chen D,Pantel P,et al.Representing text for joint embedding of text and knowledge bases[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing,2015:21-28.
[13]Lin Y,Shen S,Liu Z,et al.Neural relation extraction with selective attention over instances[C]//Proceedings of Meeting of the Association for Computational Linguistics.2016:2124-2133.
[14]Mintz M,Snow R,Jurafsky D.Distant supervision for relation extraction without labeled data[C]//Proceedings of Joint Conference of the Meeting of the ACL and the International Joint Conference on Natural Language Processing of the Afnlp:Volume.Association for Computational Linguistics,2009:1003-1011.
[15]Li Q,Ji H.Incremental joint extraction of entity mentions and relations[C]//Proceedings of Meeting of the Association for Computational Linguistics.2014:402-412.
[16]Miwa M,Sasaki Y.Modeling joint entity and relation extraction with table representation[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.2014:944-948.
[17]赵京胜,张丽,朱巧明,等.中文文学作品中的社会网路抽取与分析[J].中文信息学报,2017,31(2):99-106.
[18]王一博,俞敬松,赵常煜.共词方法在三国人物关系分析中的应用研究[J].情报探索,2017,(7):52-56.
[19]韩忠明,陈炎,刘雯,等.社会网络节点影响力分析研究[J].软件学报,2017,28(1):84-104.
[20]Alstott J,Bullmore E,Plenz D.Powerlaw:A Python package for analysis of heavy-tailed distributions[J].Plos One,2014,9(1):e85777.
[21]Perkins T J,Foxall E,Glass L,et al.A Scaling law for random walks on networks[J].Nature Communications,2014,5(5):5121.
[22]宗成庆.统计自然语言处理[M].北京:清华大学出版社,2013:18-22.