基于PU学习和自主训练的时间序列分类模型

英文篇名：Time series classification model based on PU learning and self-training
作者：郭芷榕 ; 王会青 ; 白莹莹
英文作者：GUO Zhi-rong;WANG Hui-qing;BAI Ying-ying;College of Computer Science and Technology,Taiyuan University of Technology;
关键词：时间序列 ; 半监督学习 ; 正例和未标记数据学习 ; 自主训练 ; 停止标准
英文关键词：time series;;semi-supervised learning;;positive unlabeled learning;;self-training;;stopping-criteria
中文刊名：SJSJ
英文刊名：Computer Engineering and Design
机构：太原理工大学计算机科学与技术学院;
出版日期：2018-09-16
出版单位：计算机工程与设计
年：2018
期：v.39;No.381
基金：山西省科技攻关基金项目(201603D221037-2)
语种：中文;
页：SJSJ201809015
页数：7
CN：09
ISSN：11-1775/TP
分类号：88-94

摘要

通过分析PU学习(positive unlabeled learning)的数据分布情况和自主训练算法的迭代过程,针对时间序列监督学习中自主训练算法的过早停止问题,提出基于PU学习和改进的自主训练的时间序列分类模型。针对不同的数据分布,进行不同轮次的迭代标记,将所有未标记数据进行标记,有效避免过早停止,增强模型的泛化能力。实验结果表明,该模型在PU学习时间序列分类中,具有较高的分类准确度、分类查全率和分类F1度量值。
After analyzing the data distribution of positive unlabeled learning and the iterative process of self-training algorithm,based on PU learning and improved self-training,a time series classification model was proposed.To prevent the premature stop problem and to improve the generalization capability,for different data distribution,different rounds of iterative mark were made through the model,and all the unlabeled data were marked.Experimental results indicate that the proposed model has high classification accuracy,classification rate and classification F1 metric value in PU learning time series classification.

引文

[1]Hu B,Chen Y,Keogh E.Time series classification under more realistic assumption[C]//Proceedings of the SIAM International Conference on Data Mining.San Diego:Society for Industrial and Applied Mathematics,2013:578-586.
    [2]Chen Y,Hu B,Keogh E,et al.DTW-D:Time series semisupervised learning from a single example[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2013:383-391.
    [3]ZHOU Zhihua.Machine learning[M].Beijing:Tsinghua University Press,2016(in Chinese).[周志华.机器学习[M].北京:清华大学出版社,2016.]
    [4]Gibson B R,Rogers T T,Zhu X.Human semi-supervised learning[J].Topics in Cognitive Science,2013,5(1):132.
    [5]Kleist C.Time series data mining methods:A review[D].Berlin:Humboldt-Universitt,2015.
    [6]CAO Danyang,SUN Ning,MA Nan,et al.Research on time series similarity search for noise data[J].Computer Engineering and Design,2012,33(9):3442-3446(in Chinese).[曹丹阳,孙宁,马楠,等.面向噪声数据的时间序列相似性搜索研究[J].计算机工程与设计,2012,33(9):3442-3446.]
    [7]Wang X,Mueen A,Ding H,et al.Experimental comparison of representation methods and distance measures for time series data[J].Data Mining and Knowledge Discovery,2013,26(2):275-309.
    [8]Gonzalez M,Bergmeir C,Triguero I,et al.On the stopping criteria for k-nearest neighbor in positive unlabeled time series classification problems[J].Information Sciences,2016,328(20):42-59.
    [9]Marussy K,Buza K.SUCCESS:A new approach for semi-supervised classification of time-series[M].Berlin Heidelberg:Springer Artificial Intelligence and Soft Computing,2013.
    [10]REN Yafeng,JI Donghong,ZHANG Hongbin,et al.Deceptive reviews detection based on positive and unlabeled learning[J].Journal of Computer and Development,2015,52(3):639-648(in Chinese).[任亚峰,姬东鸿,张红斌,等.基于PU学习算法的虚假评论识别研究[J].计算机研究与发展,2015,52(3):639-648.]
    [11]HAN Jiawei,Kamber M.Data mining:Concepts and techniques[M].Beijing:China Machine Press,2012(in Chinese).[韩家炜,坎伯.数据挖掘:概念与技术[M].北京:机械工业出版社,2012.]

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700