中文词语搭配特征提取及文本校对研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Study of Chinese Word Collocation Feature Extraction and Text Proofreading
  • 作者:陶永才 ; 海朝阳 ; 石磊 ; 卫琳
  • 英文作者:TAO Yong-cai;HAI Zhao-yang;SHI Lei;WEI Lin;School of Information Engineering,Zhengzhou University;Industrial Technology Research Institute,Zhengzhou University;School of Software,Zhengzhou University;
  • 关键词:词性关联 ; 文本校对 ; 正向关联 ; 知识库 ; 语法分析
  • 英文关键词:part of speech association;;text proofreading;;positive correlation;;knowledge base;;grammar analysis
  • 中文刊名:XXWX
  • 英文刊名:Journal of Chinese Computer Systems
  • 机构:郑州大学信息工程学院;郑州大学产业技术研究院;郑州大学软件技术学院;
  • 出版日期:2018-11-15
  • 出版单位:小型微型计算机系统
  • 年:2018
  • 期:v.39
  • 基金:河南省高等学校重点科研项目(16A520027)资助
  • 语种:中文;
  • 页:XXWX201811028
  • 页数:6
  • CN:11
  • ISSN:21-1106/TP
  • 分类号:135-140
摘要
随着网络的快速发展,电子文本正在人们的生活中发挥着越来越重要的作用,但是电子文本中存在着大量的字词错误以及语法错误,亟需有效的校对方法来提高电子文本的质量.本文提出一种基于词语搭配关系的文本校对方法,包括构建语法-词语搭配双层知识库以及基于互信息和聚合度双重评价条件下的词语搭配校对算法.知识库的构建主要分为语法和词语搭配两部分:(1)从训练语料中抽取并分析语句结构成分,构建语法成分知识库;(2)从训练语料中学习词语之间的搭配关系,利用共现频数和互信息进行筛选,构建词语搭配知识库.在此基础上,综合使用互信息和聚合度评价词语关联强度,进行词语搭配关系校对.实验结果显示,本文所提出的校对模型和算法的F值与其他文献相比提高了3.9%.
        With the technology of Internet developed rapidly,electronic text is playing an increasingly important role in people's life,but there are a lot of words errors and grammatical errors in the electronic text,we need some effective proofreading methods to improve the quality of electronic text. Therefore,this paper presents a text proofreading method based on collocation of words. Including the construction of grammar-word collocation double-layer knowledge base and word collocation algorithm based on double evaluation of mutual information and degree of aggregation. The construction of the knowledge base is divided into two parts:(1) extracting and analyzing the structural components of the sentences from the training corpus to construct the knowledge base of the components of the sentences;( 2) learning the collocations of the words from the training corpus,Co-occurrence frequency and mutual information screening,build words collocation knowledge base. On this basis,we use the mutual information and the degree of aggregation to evaluate the strength of word association,and proofread the collocation of words. The experimental results showthat the F-value of the proofreading model and algorithm proposed in this paper is improved by 3.9% compared with other literature.
引文
[1]Liu Liang-liang,Wang Shi,Wang Dong-sheng,et al.Automatic text error detection in domain question answering[J].Journal of Chinese Information Processing,2013,27(3):77-83.
    [2]Luo W,Luo Z,Gong X.Semantic error checking in automatic proofreading for Chinese texts[C].IEEE International Conference on Systems,Man and Cybernetics,IEEE,2002,7:5.
    [3]Zhang Yang-sen,Cao Yuan-da,Yu Shi-wen.A hybrid model of combining rule-based and statistics-based approaches for automatic detecting errors in Chinese text[J].Journal of Chinese Information Processing,2006,20(4):3-9.
    [4]Quan Cang-qin,He Ting-ting,Ji Dong-hong,et al.Chinese WSDbased on selecting the best seeds from collocations[J].Journal of Chinese Information Processing,2005,19(1):30-35.
    [5]Cheng Xian-yi,Sun Ping,Zhu Qian.The research of Chinese text proofreading system model based on HNC[J].Microelectronics&Computer,2009,26(10):49-52.
    [6]Jiang Teng-jiao,Wan Chang-xuan,Liu De-xi,et al.Extracting target-opinion pairs based on semantic analysis[J].Chinese Journal of Computers,2017,40(3):617-633.
    [7]Yin Bang-cai.Study of"the possibility of semantic collocation"[J].Theoretic Observation,2008,(6):134-135.
    [8]Zhang Yang-sen,Zheng Jia.Study of semantic error detecting method for Chinese text[J].Chinese Journal of Computers,2017,40(4):911-924.
    [9]Kukich K.Techniques for automatically correcting words in text[J].Acm Computing Surveys,1992,24(4):377-439.
    [10]Luo Wei-hua,Luo Zhen-sheng,Gong Xiao-jin.Study of techniques of automatic proofreading for Chinese texts[J].Journal of Computer Research&Development,2004,41(1):244-249.
    [11]Zheng Feng-bin,Chen Zhi-guo,Jiang Bao-qing,et al.Fuzzy matching technique by sentence skeleton in semantic collation system[J].Acta Electronica Sinica,2003,31(8):1138-1140.
    [12]Han Chong-zhao,Zhu Hong-yan,Duan Zhan-sheng.MultiSource information fusion.2nd edition[M].Beijing:Tsinghua University Press,2010.
    [13]Qu Wei-guang,Chen Xiao-he,Ji Gen-lin.A frame-based approach to Chinese collocation automatic extracting[J].Computer Engineering,2004,30(23):22-24+195.
    [14]Zhang Tao.Design and implementation of Chinese Text automatic proofreading system[D].Chengdu:Southwest Jiaotong University,2017.
    [15]Liu Liang-liang,Cao Cun-gen.Study of automatic proofreading method for Non-multi-character word error in Chinese text[J].Computer Science,2016,43(10):200-205.
    [1]刘亮亮,王石,王东升,等.领域问答系统中的文本错误自动发现方法[J].中文信息学报,2013,27(3):77-83.
    [3]张仰森,曹元大,俞士汶.基于规则与统计相结合的中文文本自动查错模型与算法[J].中文信息学报,2006,20(4):3-9.
    [4]全昌勤,何婷婷,姬东鸿,等.从搭配知识获取最优种子的词义消歧方法[J].中文信息学报,2005,19(1):30-35.
    [5]程显毅,孙萍,朱倩.基于HNC的中文文本校对系统模型的研究[J].微电子学与计算机,2009,26(10):49-52.
    [6]江腾蛟,万常选,刘德喜,等.基于语义分析的评价对象-情感词对抽取[J].计算机学报,2017,40(3):617-633.
    [7]尹邦才.试论"语义搭配的可能性"[J].理论观察,2008,(6):134-135.
    [8]张仰森,郑佳.中文文本语义错误侦测方法研究[J].计算机学报,2017,40(4):911-924.
    [10]骆卫华,罗振声,宫小瑾.中文文本自动校对技术的研究[J].计算机研究与发展,2004,41(1):244-249.
    [11]郑逢斌,陈志国,姜保庆,等.语义校对系统中的句子语义骨架模糊匹配算法[J].电子学报,2003,31(8):1138-1140.
    [13]曲维光,陈小荷,吉根林.基于框架的词语搭配自动抽取方法[J].计算机工程,2004,30(23):22-24+195.
    [14]张涛.中文文本自动校对系统设计与实现[D].成都:西南交通大学,2017.
    [15]刘亮亮,曹存根.中文"非多字词错误"自动校对方法研究[J].计算机科学,2016,43(10):200-205.
    1http://dow nload.csdn.net/dow nload/zxlxstly/669716#,2008-10-06

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700