基于共享表示的跨领域中文模糊限制语识别

英文篇名：Cross-domain Chinese Hedge Cue Detection Based on Shared Representations
作者：周惠巍 ; 宁时贤 ; 杨云龙 ; 刘壮 ; 林英玉 ; 李思嘉
英文作者：ZHOU Huiwei;NING Shixian;YANG Yunlong;LIU Zhuang;LIN Yingyu;LI Sijia;School of Computer Science and Technology,Dalian University of Technology;College of Information and Electrical Engineering,Feng Chia University;
关键词：中文模糊限制语识别 ; 跨领域 ; 共享表示 ; 对抗学习
英文关键词：Chinese hedge cue detection;;cross-domain;;shared representation;;adversarial learning
中文刊名：ZZDZ
英文刊名：Journal of Zhengzhou University(Natural Science Edition)
机构：大连理工大学计算机科学与技术学院;台湾逢甲大学资讯电机学院;
出版日期：2018-12-17 11:22
出版单位：郑州大学学报(理学版)
年：2019
期：v.51
基金：国家自然科学基金项目(61772109,61272375);; 教育部人文社科项目(17YJA740076)
语种：中文;
页：ZZDZ201902006
页数：6
CN：02
ISSN：41-1338/N
分类号：37-42

摘要

为充分利用源领域的标注数据,减少目标领域的标注代价,提出一种基于共享表示的跨领域模糊限制语识别方法.该方法利用双向长短期记忆网络,通过参数共享机制交替地学习源领域和目标领域的训练数据,同时引入对抗学习,把各领域私有特征从共享特征中剥离,从而获得不同领域间的共享语义表示.在中文生物医学和维基百科两个领域上的实验表明,基于共享表示的方法在跨领域中文模糊限制语识别性能上明显优于基于实例和基于特征的迁移学习方法.
To make full use of out-of-domain data and minimize annotation costs to adapt to a new domain,a novel cross-domain approach based on shared representations was proposed for hedge cue detection. This approach used bidirectional long short-term memory network to alternately learn the training data in the source and target domain by using parameter-sharing mechanism. Meanwhile,it introduced adversarial learning to separate the private features of each domain from the shared features,for the purpose of obtaining the shared semantic representations across different domains. Experiments on Chinese biomedical domain and Wikipedia domain showed that the method based on shared representations could get a significant improvement on cross-domain Chinese hedge cue detection,compared to instance-based transfer learning and feature-based transfer learning methods.

引文

[1] LAKOFF G. Hedges:a study in meaning criteria and the logic of fuzzy concepts[J]. Journal of philosophical logic,1973,2(4):458-508.
    [2] FARKAS R,VINCZE V,MRA G,et al. The Co NLL-2010 shared task:learning to detect hedges and their scope in naturallanguage text[C]∥Proceedings of the Fourteenth Conference on Computational Natural Language Learning(Co NLL 2010):Shared Task. Uppsala,2010:1-12.
    [3] SZARVAS G,VINCZE V,FARKAS R,et al. Cross-genre and cross-domain detection of semantic uncertainty[J]. Computa-tional linguistics,2012,38(2):335-367.
    [4]计峰,邱锡鹏,黄萱菁.中文不确定性句子的识别研究[C]∥第六届全国信息检索学术会议.哈尔滨,2010:594-601.
    [5] CHEN Z,ZOU B,ZHU Q,et al. Chinese negation and speculation detection with conditional random fields[C]∥Natural Lan-guage Processing and Chinese Computing:Second CCF Conference. Chongqing,2013:30.
    [6] ZOU B,ZHU Q,ZHOU G. Negation and speculation identification in Chinese language[C]∥Proceedings of the 53rd AnnualMeeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Process-ing. Beijing,2015:656-665.
    [7] DAUMEIII H. Frustratingly easy domain adaptation[C]∥The 45th Annual Meeting of the Association of Computational Lin-guistics. Prague,2007:256-263.
    [8] DAI W Y,YANG Q,XUE G R,et al. Boosting for transfer learning[C]∥Proceedings of the 24th International Conference onMachine Learning. Corvallis,2007:193-200.
    [9] YANG Z,SALAKHUTDINOV R,COHEN W W. Transfer learning for sequence tagging with hierarchical recurrent networks[C]∥Proceedings of the International Conference on Learning Representations. Toulon,2017:1-10.
    [10] KIM J K,KIM Y B,SARIKAYA R,et al. Cross-lingual transfer learning for POS tagging without cross-lingual resources[C]∥Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen,2017:2822-2828.
    [11] CHEN X,SHI Z,QIU X,et al. Adversarial multi-criteria learning for Chinese word segmentation[C]∥Proceedings of the 55thAnnual Meeting of the Association for Computational Linguistics. Vancouver,2017:1193-1203.
    [12] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al. Generative adversarial nets[C]∥Proceedings of the 2014 Confer-ence on Advances in Neural Information Processing Systems. Montréal,2014:2672-2680.
    [13] ZHOU H,LI X,HUANG D,et al. Exploiting multi-features to detect hedges and their scope in biomedical texts[C]∥Pro-ceedings of the Fourteenth Conference on Computational Natural Language Learning:Shared Task. Uppsala,2010:106-113.
    [14]周惠巍,杨欢,张静,等.中文模糊限制语语料库的研究与构建[J].中文信息学报,2015,29(6):83-89.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700