跨领域模式下语料库信息智能筛选仿真研究

英文篇名：Research on Intelligent Screening of Corpus Information in Cross-domain Mode
作者：官小龙
英文作者：GUAN Xiao-long;College of Foreign Language,Shandong University of Science and Technology;
关键词：跨领域模式 ; 语料库信息 ; 智能筛选
英文关键词：Cross-domain mode;;Corpus information;;Intelligent screening
中文刊名：JSJZ
英文刊名：Computer Simulation
机构：山东科技大学外国语学院;
出版日期：2018-09-15
出版单位：计算机仿真
年：2018
期：v.35
语种：中文;
页：JSJZ201809064
页数：5
CN：09
ISSN：11-3724/TP
分类号：317-320+387

摘要

对跨领域模式下语料库信息进行筛选,能够提高语料库信息提取的完整性。针对当前跨领域模式下语料库信息智能筛选方法存在的筛选质量差,筛选执行时间较长,占用计算机内存较大问题,提出一种基于多层向量空间模型的跨领域模式下语料库信息智能筛选方法,通过编辑距离、向量余弦和Jaccard系数方法对跨领域模式下语料库信息文本的相似度进行评价,根据评价结果,将语料库信息划分为多个相对独立的文本段,根据文本段中索引项位置的不同,赋予文本段不同的权重,构建多层向量空间模型,并根据文本相似度确定模型相似度,根据相似度结果,完成跨领域模式下语料库信息智能筛选。实验结果表明,所提方法能够有效提高信息筛选的质量,且筛选速度较快,占用的计算机内存较少。
The screening of corpus information in cross-domain mode can improve the integrity of corpus information extraction. Aiming at the poor screening quality of the corpus intelligence screening method in the current crossdomain mode,the long screening time and the large computer memory usage,an intelligent screening method for corpus information under the cross-domain model based on multi-level vector space model is proposed. The similarity of corpus information texts in cross-domain mode is evaluated by editing the distance,vector cosine and Jaccard coefficient. According to the evaluation results,the corpus information is divided into multiple independent text segments,and according to the position of the index entries in the text segment,different weights are given to the index item,a multi-level vector space model is constructed,and similarity calculation formulas are used to determine the similarity of the model. Based on the result of similarity,the intelligence screening of corpus information under cross-domain mode is completed. Simulation results show that the proposed method can effectively improve the quality of information screening with faster screening and less occupied computer memory.

引文

[1]张艳菊,等.基于德尔菲法的儿童家庭安全用药核心信息筛选研究[J].中国药房,2017,28(3):298-301.
    [2]李阳,杜垚.文本情报信息筛选与聚类的一种处理方法[J].火力与指挥控制,2017,42(2):172-175.
    [3]赵学健,孙知信,袁源.基于预判筛选的高效关联规则挖掘算法[J].电子与信息学报,2016,38(7):1654-1659.
    [4]宋静,等.基于模糊综合评价法的情感语音数据库的建立[J].现代电子技术,2016,39(13):51-54.
    [5]陈洪海.基于信息可替代性的评价指标筛选研究[J].统计与信息论坛,2016,31(10):17-22.
    [6]何旭峰,等.基于LDA主题模型的分布式信息检索集合选择方法[J].中文信息学报,2017,31(3):125-133.
    [7]李爱勤.多级索引驱动的地名信息检索方法[J].测绘科学,2017,42(4):103-107.
    [8]杜朝晖,朱文耀.云存储中利用属性基加密技术的安全数据检索方案[J].计算机应用研究,2016,33(3):860-865.
    [9]丁亮,等.基于汉语主题词表的统计机器翻译训练数据筛选方法及实验研究[J].情报学报,2016,35(8):875-884.
    [10]刘成友,等.基于基因表达谱数据筛选差异表达基因新方法[J].数学的实践与认识,2016,46(18):122-128.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700