面向文本聚类的实体—动作关联模型研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Entity-Action Relationship Model for Text Clustering
  • 作者:刘作国 ; 陈笑蓉
  • 英文作者:LIU Zuoguo;CHEN Xiaorong;College of Computer Science and Technology,Guizhou University;
  • 关键词:文本表示模型 ; 实体—动作关联 ; 句型识别 ; 动作层次分解
  • 英文关键词:text expression model;;entity-action relationship;;sentence patterns recognition;;action layer decomposition
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:贵州大学计算机科学与技术学院;
  • 出版日期:2018-05-15
  • 出版单位:中文信息学报
  • 年:2018
  • 期:v.32
  • 基金:国家自然科学基金(61363028)
  • 语种:中文;
  • 页:MESS201805003
  • 页数:9
  • CN:05
  • ISSN:11-2325/N
  • 分类号:27-35
摘要
该文提出面向文本聚类分析的实体—动作关联模型EARM,探讨汉语语义实体及其行为的描述方法。汉语属于非形态语言,语句没有时态及语态的变化,词类跟句法成分之间也不是简单的一一对应关系。该文提出一种句法成分识别机制,根据词汇类别特征及位置特征识别实体及动作。在句法成分识别的基础上展开句法分析,通过匹配句型特征建立实体—动作关联模型EARM,描述实体的行为及状态。对于嵌套句型等较为复杂的句型结构,需要在句法分析过程中实施动作层次分解,将复杂语句分解为简单的基本句型,以便于挖掘实体—动作关联。考虑到汉语语法比较灵活,语句成分缺省和倒装现象相对普遍,该文提出了倒装句的识别机制,通过匹配接近的句型进行实体移位,调整语序。论述了基于统计模型的EARM权重量化策略,借助语法树的最大公共子图量化文本的相似度并实施聚类,设计并开展了EARM实体—动作分析实验和EARM聚类实验。实验结果表明EARM的分析是准确有效的,聚类结果是合理的。
        This paper present an Entity-Action Relationship Model(EARM)for text clustering with a purpose to describe Chinese semantic entities and behaviors.Since Chinese is a non-inflection language,we cannot easily find a one-to-one relationship between word properties and syntax elements at the surface level.A syntax element recognition mechanism is designed to recognize entities and actions according to words properties and position characters.Then EARM is built according to sentence patterns so as to describe the entities' behaviors and states.For some complex sentences,e.g.the nested sentences,it is necessary to launch action layer decomposition and simplify them into simple sentences in order to mine Entity-Action Relationship during the period of syntax analysis.For the omission and inversion in the syntaxa recognition mechanism is designed to move entities and reorder sentences by matching inverted sentences with similar sentence patterns.Maximum Common Sub-graphs of syntax trees are introduced to calculate text similarity and take clustering.Finally,the experiment shows that EARM is accurate and effective and the clustering result is reasonable.
引文
[1]宋巍,张宇,刘挺,等.基于检索历史上下文的个性化查询重构技术研究[J].中文信息学报,2010,24(3):144-152.
    [2]曹雷,郭嘉丰,白露,等.基于半监督话题模型的用户查询日志命名实体挖掘[J].中文信息学报,2012,26(5):26-32.
    [3]Kuznetsov V A,Mochalov V A,Mochalova A V.Ontological-semantic text analysis and the question answering system using data from ontology[C]//Proceedings of the 18th International Conference on Advanced Communication Technology.Pyeongchang,South Korea.IEEE,2016:651-658.
    [4]Shen Haiying,Liu Guoxin,Wang Haoyu,et al.Social Q&A:An online social network based question and answer system[J].IEEE Transactions on Big Data,2017,3(1):91-106.
    [5]刘丹丹,彭成,钱龙华,等.《同义词词林》在中文实体关系抽取中的作用[J].中文信息学报,2014,28(2):91-99.
    [6]刘丹丹,彭成,钱龙华,等.词汇语义信息对中文实体关系抽取影响的比较[J].计算机应用,2014,32(8):2238-2244.
    [7]Fei Wu,Daniel S W.Open information extraction using Wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.Pennsylvania,USA.Association for Computational Linguistics,2010:118-127.
    [8]杨丹,申德荣,聂铁铮,等.异构信息空间中实体关联关系挖掘算法[J].计算机研究与发展,2014,51(4):895-904.
    [9]Yuenhsien Tseng,Lunghao Lee,Shuyen Lin,et al.Chinese open relation extraction for knowledge acquisition[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.Gothenburg,Sweden.Association for Computational Linguistics,2014:12-16.
    [10]Qiu Likun,Zhang Yue.ZORE:A syntax-based system for Chinese open relation extraction[C]//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing.Doha,Qatar:Association for Computational Linguistics,2014:1870-1880.
    [11]Bai Xiaopeng,Li Bin.Comparing argument structure in Chinese verb taxonomy and Chinese propbank[C]//Proceedings of 2015IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.Singapore:IEEE,2015:188-190.
    [12]Ma Hong,Lian Xin,Jiang Kun,et al.Research on delay ambiguity solving method based on Chinese remainder theorem[C]//Proceedings of 2014International Conference on Information and Communications Technologies.Nanjing,China.IET,2014:1-4.
    [13]朱德熙.语法答问[M].北京:商务印书馆,1985.
    [14]范婷.现代汉语歧义表层结构形式及其分化方法研究[D].成都:四川外语学院硕士学位论文,2012.
    [15]怀宝兴,宝腾飞,祝恒书,等.一种基于概率主题模型的命名实体链接方法[J].软件学报,2014,25(9):2076-2087.
    [16]邓擘,郑彦宁,傅继彬.汉语实体关系模式的自动获取研究[J].计算机科学,2010,37(2):183-185.
    [17]朱德熙.语法讲义[M].北京:商务印书馆,1982:38-55.
    [18]Huang C T J,Li Y H A,Yafei Li.The Syntax of Chinese[M].America:World Book Inc,2013:108-113.
    [19]赵元任.汉语口语语法[M].北京:商务印书馆,1979.
    [20]何钟豪,苏劲松,史晓东,等.引入集成学习的最大熵短语调序模型[J].中文信息学报,2014,28(1):87-93.
    [21]Liu Zuoguo,Chen Xiaorong.Mapping texts into graphs:An improved text similarity algorithm[C]//Proceedings of 2012 2nd International Conference on Computer Science and Network Technology.Changchun:Springer,2012:1357-1361.
    [22]刘作国,陈笑蓉.高斯加权的重构性K-NN算法研究[J].中文信息学报,2015,29(5):112-116.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700