摘要
随着藏文信息开始与现代化接轨,藏文信息数量在网络上迅速增加。面对海量的网络信息,如何从中挖掘人们所需的信息成为目前关注的热点。目前中文实体关系抽取研究已取得较多成果,而在藏语人物属性抽取研究方面还有很大的提升空间。实验选取实体位置关系、实体间距离关系、实体及周围词特征进行特征向量化。通过BP神经网络模型进行分类抽取,并且取得了较好效果。研究成果可在搜索引擎、信息安全、机器翻译等许多应用领域发挥重要作用。
At present,Tibetan information is quickly connected with modernization and information,which brings in the expansive development of Tibetan information on the network. In the face of the massive network information,how to extract the information that people want is an urgent problem to be solved. Currently,Chinese entity relation extraction studies have some good results,but there is still much space to Tibetan entity relation extraction. The experiment selects entity location features,entity distance features,entities and surrounding word features for further vectorization. The BP neural network model was used for classification extraction and good results were obtained. This research has a very important role in the search engine,information security,machine translation and many other applications.
引文
[1]李光,钟雅琼.大陆研拟藏维文网络舆情监测系统监控分裂风险[EB/OL]. http://news.ifeng.com/.
[2] BIZER C,HEATH T,BERNERS-LEE T. Linked data-the story so far[J]. International Journal on Semantic Web and Information Systems(IJSWIS),2009,5(3):1-22.
[3]梁金宝.藏语历史文献词汇统计[D].北京:中国社会科学院民族学与人类学研究所,2013.
[4]祁坤钰.信息处理用藏文自动分词研究[J].西北民族大学学报:哲学社会科学版,2006,26(4):92-97.
[5] ZHOU G D,ZHANG M. Extracting relation information from text documents by exploring various types of knowledge[J]. Information Processing and Management,2007(43):969-982.
[6] NANDA KAMBHATLA. Combining lexical,syntactic and semantic features with Maximum Entropy models for extracting relations[C].Proceedings of ACL,2004:178-181.
[7] QIAN L H,ZHOU G G,KONG F,et al. Exploiting constituent dependencies for tree kernel-based semantic relation extraction[C]. Proceedings of COLING,2008:697-704.
[8] ZHOU G D,ZHANG M,JI D H,et al. Tree kernel-based relation extraction with context-sensitive structured parse tree information[C]. Proceedings of EMNLP/CONLL,2007:728-736.
[9] CHE W X,JIANG J M,ZHONG SU,et al. Improved-Edit-Distance kernel for Chinese relation extraction[C]. Proceedings of IJCNLP,2005:132-137.
[10]庄成龙,钱龙华,周国栋.基于树核函数的实体语义关系抽取方法研究[J].中文信息学报,2009,23(1):3.
[11]邓擘,樊孝忠,杨立公.用语义模式提取实体关系的方法[J].计算机工程,2007,33(10):212-214.
[12]张苇如,孙乐,韩先培.基于维基百科和模式聚类的实体关系抽取方法[J].中文信息学报,2012,26(2):75-81.
[13] CULOTTA A,SORENSEN J. Dependency tree kernels for relation extraction[C]. Proceedings of ACL,2004:423-429.
[14] ZHANG M,ZHANG J,SU J,et al. A compo site kernel to extract relations between entities with both flat and structured features[C].Proceedings of ACL,2006:825-832.
[15] SUN Y,YAN X,ZHAO X,et al. Research on automatic recognition of Tibetan personal names based on multi-features[C]. International Conference on Natural Language Processing and Knowledge Engineering. IEEE,2010:1-5.
[16]朱臻,孙媛.基于SVM和泛化模版协作的藏文人物属性抽取[J].中文信息学报,29(6):220-227.
[17]兰义涌.藏文人名属性抽取及消歧研究[D].北京:中央民族大学,2016.