摘要
针对当前隐私保护数据库中自适应重复数据删除时,存在着重复数据删除时间过长,误删率较高、内存消耗过大等问题,提出基于分数阶变换累积量的重复数据删除方法。通过对隐私保护数据库中的重复数据进行分析,得到重复数据特征区间,对区间进行特征域平滑,计算出重复数据静态量化向量方差,以方差计算结果获取隐私保护数据库中存储节点子集随机概率,构建重复数据流信息模型,引用数据分数阶变换方法对模型中重复数据信息进行滤波处理,结合4阶累积量后置聚集进行重复数据删除。实验结果表明,所提方法重复数据删除时间较短、误删率较低、内存消耗较小。
In this paper,a data deduplication method based on fractional transform cumulant was presented.Through the analysis of duplicated data in privacy protection database,the feature interval of duplicated data was obtained and the feature domain of interval was smoothed. Then,the variance of static quantization vector of duplicated data was calculated. According to calculation result of the variance,the random probability of subset of storage nodes in privacy protection database was obtained. Moreover,the duplicated data flow information model was built. Finally,the fractional transform method was used to filter the duplicated data information in model. Thus,the fourth-order cumulant post-aggregation was applied to the data deduplication. Simulation results show that the proposed method takes less time in data deduplication,which has lower error deletion rate and lower memory consumption.
引文
[1]杨超,等.新的云存储文件去重复删除方法[J].通信学报,2017,38(3):25-33.
[2]吴挺.分类敏感属性规则的数据库隐私保护模型.[J].科技通报,2017,6(33):184-187.
[3]郭瑞,钱晓东.基于一阶谓词公式去除商务数据冗余关联规则的研究[J].计算机工程与科学,2017,39(3):593-598.
[4]熊安萍,王运萍,邹洋.基于数据冗余的HBase合并机制研究[J].计算机工程,2017,43(2):63-67.
[5]涂静文.大数据库的相似记录检测方法研究[J].计算机仿真,2017,34(3):410-413.
[6]王青松,葛慧. Winnowing指纹串匹配的重复数据删除算法[J].计算机应用,2018,38(3):677-681.
[7]王闪,谭良. Web大数据环境下的相似重复数据清理[J].计算机工程与设计,2017,38(3):646-651.
[8]卞琛,于炯,修位蓉.基于回归检测的滑动块重复数据删除算法[J].新疆大学学报(自然科学版),2017,34(3):259-266.
[9]刘青,等.基于Hadoop平台的分布式重删存储系统[J].计算机应用,2016,36(2):330-335.
[10]谭捷,等.二进制翻译中冗余指令优化算法[J].计算机研究与发展,2017,54(9):1931-1944.