基于融合特征的中文图书作者人名消歧方法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on Chinese Book Author's Name Disambiguation Based on Fusion Features
  • 作者:李孟亚
  • 英文作者:LI Meng-ya;College of Computer Science, North China University of Technology;
  • 关键词:中文图书作者 ; 人名消歧 ; 互斥放大 ; 空缺缩小
  • 英文关键词:Chinese book author;;name disambiguation;;mutex amplification;;vacancy reduction
  • 中文刊名:DNZS
  • 英文刊名:Computer Knowledge and Technology
  • 机构:北方工业大学计算机学院;
  • 出版日期:2018-04-15
  • 出版单位:电脑知识与技术
  • 年:2018
  • 期:v.14
  • 语种:中文;
  • 页:DNZS201811076
  • 页数:3
  • CN:11
  • ISSN:34-1205/TP
  • 分类号:188-190
摘要
中文图书作者中一人多名和多人同名现象普遍存在;且各属性描述参差不齐。融合特征消歧算法处理过程中准确率有所下降。本文将作者属性分为实体特征、上下文关系特征、社会关系特征。借助向量空间模型用属性互斥放大和特征矩阵空缺缩小方法调整属性和矩阵权重系数后计算作者相似度。通过基于凝聚的层次聚类实现消歧,构建中文图书作者信息模型。用B_Cubed指标评测消歧结果,准确率、F值分别达到为89.42%、87.45%。
        There is a widespread phenomenon that one person has many names and mutil-persons have co-name in Chinese book authors; and the description of attributes are uneven.The phenomenon of the homonym of more than one and many people in Chinese book writers is common, and the description of each attribute is uneven.The accuracy of the fusion feature disambiguation algorithm is reduced.This paper divides the author's attributes into three categories: Entity Features, Contextual Relationships, and Social Relations.With the aid of the vector space model, the attribute mutex amplification and the matrix vacancy reduction method are used to adjust the weight, then calculate the authors' similarity.The Chinese book author information model is constructed by using the hierarchical agglomerative clustering to realize disambiguation. The results of disambiguation were evaluated with B_Cubed index. The accuracy and F-value were 89.42% and 90.47% respectively.
引文
[1]线岩团.基于特征加权重叠度的中文实体协同消歧方法[J].中文信息学报,2017,31(2):36-41.
    [2]蒲旭,王建勇,范晓明.GHOST:作者名字排歧系统[J].计算机研究与发展,2010,47:512-515.
    [3]郑才松,季铎,蔡东风.基于系统融合的专家同名区分方法[J].沈阳航空航天大学学报,2014,31(2):74-78.
    [4]H.Han,L.Giles,H.Zha,C.Li,and K.Tsioutsiouliklis,"Two Supervised Learning Approaches for Name Disambiguation in Author Citations,"Proc.ACM/IEEE Joint Conf.Digital Libraries(JCDL'04),2004:296-305.
    [5]H.Han,H.Zha,and C.L.Giles,"Name Disambiguation in Author Citations Using a K-Way Spectral Clustering Method,”Proc.ACM/IEEE Joint Conf.Digital Libraries(JCDL'05),2005:334-343.
    [6]L.Shu,B.Long,and W.Meng,"A Latent Topic Model for Complete Entity Resolution,”Proc.IEEE Int'l Conf.Data Eng.(ICDE'09),2009:880-891.
    [7]田维.基于半监督图聚类的专家消歧方法研究[D].昆明理工大学,2013.
    [8]H.Han,H.Zha,and C.L.Giles,“Name Disambiguation in Author Citations Using a K-Way Spectral Clustering Method,"Proc.ACM/IEEE Joint Conf.Digital Libraries(JCDL'05),2005:334-343.
    [9]L.Shu,B.Long,and W.Meng,"A Latent Topic Model for Complete Entity Resolution,”Proc.IEEE Int'1 Conf.Data Eng.(ICDE'09),2009:880-891.
    [10]阳怡林,周杰,李弼程,等.基于聚类集成的人名消歧算法[J].计算机应用研究,2016,33(9):2716-2720.
    [11]朱小婷.基干本体的中文人名消歧[D].华东师范大学,2013.
    [12]https://book.douban.com/tag/?view=type&icn=index-sorttagsall
    [13]National Institute of Standards and Technology.Open KWS13keyword search evaluationplan[EB/OL].(2013).