用户名: 密码: 验证码:
Novelty detection in data streams
详细信息    查看全文
  • 作者:Elaine R. Faria ; Isabel J. C. R. Gonçalves…
  • 关键词:Novelty detection ; Data streams ; Survey ; Classification
  • 刊名:Artificial Intelligence Review
  • 出版年:2016
  • 出版时间:February 2016
  • 年:2016
  • 卷:45
  • 期:2
  • 页码:235-269
  • 全文大小:1,333 KB
  • 参考文献:Aggarwal CC (2007) Data streams: models and algorithms. Springer, BerlinCrossRef
    Aggarwal CC (2013) Outlier analysis. Springer, BerlinMATH CrossRef
    Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th conference on very large data bases, pp 81–92
    Al-Khateeb T, Masud MM, Khan L, Aggarwal C, Han J, Thuraisingham B (2012a) Stream classification with recurring and novel class detection using class-based ensemble. In: Proceddings of the IEEE 12th international conference on data mining (ICDM ’12). IEEE Computer Society, Washington, DC, USA, pp 31–40
    Al-Khateeb TM, Masud MM, Khan L, Thuraisingham B (2012) Cloud guided stream classification using class-based ensemble. In: Proceedings of the 2012 IEEE 5th international conference on computing (CLOUD’12). IEEE Computer Society, Washington, DC, USA, pp 694–701
    Albertini MK, de Mello RF (2007) A self-organizing neural network for detecting novelties. In: Proceedings of the 2007 ACM symposium on applied computing (SAC ’07), pp 462–466
    Aregui A, Denœux T (2007) Fusion of one-class classifiers in the belief function framework. In: Proceedings of the 10th international conference on information fusion, pp 1–8
    Bicego M, Figueiredo MAT (2009) Soft clustering using weighted one-class support vector machines. Pattern Recognit 42(1):27–32MATH CrossRef
    Box GEP, Jenkins G (1990) Time series analysis: forecasting and control. Holden-Day, Incorporated, San Francisco
    Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):58CrossRef
    Coull S, Branch J, Szymanski B, Breimer E (2003) Intrusion detection: a bioinformatics approach. In: Proceedings of 19th international conference on computer security applications (ACSAC 2003). Nevada, USA, IEEE Computer Society, Las Vegas, pp 24–33
    de Faria ER, Goncalves IR, Gama J, Carvalho ACPLF (2015a) Evaluation of multiclass novelty detection algorithms for data streams. Knowl Data Eng, IEEE Trans 27(11):2961–2973. doi:10.​1109/​TKDE.​2015.​2441713
    de Faria ER, Carvalho ACPLF, Gama J (2015b) MINAS: multiclass learning algorithm for novelty detection in data streams. Data Min and Knowl Discov. doi:10.​1007/​s10618-015-0433-y
    Dawid AP (1984) Statistical theory: the prequential approach (with discussion). J R Stat Soc A 147:278–292MATH CrossRef MathSciNet
    Denis F, Gilleron R, Letouzey F (2005) Learning from positive and unlabeled examples. Theor Comput Sci 348(1):70–83MATH CrossRef MathSciNet
    Dries A, Rückert U (2009) Adaptive concept drift detection. Stat Anal Data Min 2(56):311–327CrossRef MathSciNet
    Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531CrossRef
    Faria ER, Gama J, Carvalho ACPLF (2013a) Novelty detection algorithm for data streams multi-class problems. In: Proceedings of the 28th symposium on applied computing (ACM SAC’13), pp 795–800
    Faria ER, Gonçalves IR, Gama J, Carvalho ACPLF (2013b) Evaluation methodology for multiclass novelty detection algorithms. In: Proceedings of the 2nd Brazilian conference on intelligent systems (BRACIS’13), pp. 19–25
    Farid DM, Rahman CM (2012) Novel class detection in concept-drifting data stream mining employing decision tree. In: Proceedings of the 7th international conference on electrical computer engineering (ICECE’ 2012), pp 630–633
    Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906CrossRef
    Frank A, Asuncion A (2010) UCI machine learning repository. http://​archive.​ics.​uci.​edu/​ml
    Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. SIGMOD Rec 34(2):18–26CrossRef
    Gama J (2010) Knowledge discovery from data streams, 1st edn. CRC Press Chapman Hall, Boca RatonMATH CrossRef
    Gama J, Sebastião R, Rodrigues PP (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346MATH CrossRef MathSciNet
    Gaughan G, Smeaton AF (2005) Finding new news: novelty detection in broadcast news. In: Proceedings of the 2nd Asia conference on Asia information retrieval technology (AIRS’05), pp 583–588
    Gogoi P, Bhattacharyya D, Borah B, Kalita JK (2011) A survey of outlier detection methods in network anomaly identification. Comput J 54(4):570–588CrossRef
    Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco
    Hayat M, Basiri J, Seyedhossein L, Shakery A (2010) Content-based concept drift detection for email spam filtering. In: Proceedings of the 5th international symposium on telecommunications (IST’10), pp 531–536
    Hayat MZ, Hashemi MR (2010) A DCT based approach for detecting novelty and concept drift in data streams. In: Proceedings of the international conference on soft computing and pattern recognition (SoCPaR), pp 373–378
    Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126MATH CrossRef
    Hoffmann H (2007) Kernel PCA for novelty detection. Pattern Recognit 40(3):863–874MATH CrossRef
    Juszczak P, Duin RPW (2004) Combining one-class classifiers to classify missing data. In: Roli F, Kittler J, Windeatt T (eds) Multiple classifier systems. Springer, Berlin, pp 92–101CrossRef
    Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391CrossRef
    Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790MATH
    Krawczyk B, Michal W (2013) Incremental learning and forgetting in one-class classifiers for data streams. In: Proceedings of the 8th international conference on computer recognition systems (CORES’ 13), advances in intelligent systems and computing, vol 226, pp 319–328
    Lee H, Roberts S (2008) On-line novelty detection using the kalman filter and extreme value theory. In: Proceedings of 19th international conference on pattern recognition (ICPR 2008). Tampa, Florida, USA, IEEE, pp 1–4
    Li X (2006) Improving novelty detection for general topics using sentence level information patterns. In: Proceedings of the 15th ACM international conference on information and knowledge management (CIKM ’06), ACM, pp 238–247
    Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM’03), pp 179–186
    Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137MATH CrossRef MathSciNet
    MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) 5th Berkeley symposium on mathematical statistics and orobability, vol 1, pp 281–297
    Markou M, Singh S (2003a) Novelty detection: a review part 1: statistical approaches. Signal Process 83(12):2481–2497MATH CrossRef
    Markou M, Singh S (2003b) Novelty detection: a review part 2: neural network based approaches. Signal Process 83(12):2499–2521MATH CrossRef
    Marrocco C, Simeone P, Tortorella F (2007) A framework for multiclass reject in ECOC classification systems. In: Proceedings of the 15th Scandinavian conference on image analysis (SCIA’07), pp 313–323
    Marsland S (2003) Novelty detection in learning systems. Neural Comput Surv 3:157–195
    Marsland S, Shapiro J, Nehmzow U (2002) A self-organising network that grows when required. Neural Netw 15:1041–1058CrossRef
    Masud M, Gao J, Khan L, Han J, Thuraisingham BM (2011a) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874CrossRef
    Masud MM, Chen Q, Khan L, Aggarwal CC, Gao J, Han J, Thuraisingham BM (2010a) Addressing concept-evolution in concept-drifting data streams. In: Proceedings of the 10th IEEE international conference on data mining (ICDM’10), pp 929–934
    Masud MM, Gao J, Khan L, Han J, Thuraisingham B (2010b) Classification and novel class detection in data streams with active mining. In: Proceedings of the 14th Pacific-Asia conference on advances in knowledge discovery and data mining—volume Part II (PAKDD’10), pp 311–324
    Masud MM, Al-Khateeb TM, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B (2011b) Detecting recurring and novel classes in concept-drifting data streams. In: Proceedings of the 11th IEEE international conference on data mining (ICDM ’11), pp 1176–1181
    Masud MM, Woolam C, Gao J, Khan L, Han J, Hamlen KW, Oza NC (2011c) Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst 33(1):213–244CrossRef
    Menahem E, Rokach L, Elovici Y (2013) Combining one-class classifiers via meta-learning. In: ACM international conference on information and knowledge management (CIKM 2013), p to be appeared
    Minegishi T, Niimi A (2011) Detection of fraud use of credit card by extended VFDT. In: World congress on internet security (WorldCIS’11), pp 152–159
    Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc, New YorkMATH
    Nadeem MSA, Zucker JD, Hanczar B (2010) Accuracy-rejection curves (ARCs) for comparing classification methods with a reject option. In: Workshop and conference proceedings on machine learning in systems biology, vol 8, pp 65–81
    Park CH, Shim H (2010) Detection of an emerging new class using statistical hypothesis testing and density estimation. Int J Pattern Recognit Artif Intell 24(1):1–14CrossRef
    Perdisci R, Gu G, Lee W (2006) Using an ensemble of one-class svm classifiers to harden payload-based anomaly detection systems. In: Proceedings of the 6th international conference on data mining (ICDM ’06), pp 488–498
    Perner P (2008) Concepts for novelty detection and handling based on a case-based reasoning process scheme. Eng Appl Artif Intell 22:86–91CrossRef
    Pillai I, Fumera G, Roli F (2011) A classification approach with a reject option for multi-label problems. In: Proceedings of the 16th international conference on image analysis and processing: Part I (ICIAP’11), pp 98–107
    Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249CrossRef
    Ramezani R, Angelov P, Zhou X (2008) A fast approach to novelty detection in video streams using recursive density estimation. In: Proceedings of the 4th international IEEE conference on intelligent systems (IS ’08), vol 2, pp 14–2–14–7
    Rios G, FILHO RH, Coelho ALC (2011) An autonomic security mechanism based on novelty detection and concept drift. In: Proceeding of the 7th international conference on autonomic and autonomous systems
    Rusiecki A (2012) Robust neural network for novelty detection on data streams. In: Proceedings of the 11th international conference on artificial intelligence and soft computing—volume Part I (ICAISC’12), pp 178–186
    Schölkopf B, Williamson R, Smola A, Taylor JS, Platt J (2000) Support vector method for novelty detection. Adv Neural Inf Process Syst 12:582–588
    Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471MATH CrossRef
    Shyu ML, Sarinnapakorn K, Kuruppu-Appuhamilage I, Chen SC, Chang L, Goldring T (2005) Handling nominal features in anomaly intrusion detection problems. In: Proceedings of the 15th international workshop on research issues in data engineering: stream data mining and applications (RIDE ’05), pp 55–62
    Silva JA, Faria ER, Barros RC, Hruschka ER, Carvalho ACPLF, Gama J (2014) Data stream clustering: a survey. ACM Comput Surv 46(1):31
    Singh S, Markou M (2005) A black hole novelty detector for video analysis. Pattern Anal Appl 8(1):102–114CrossRef MathSciNet
    Singh S, Markow M (2004) An approach to novelty detection applied to the classification of image regions. IEEE Trans Knowl Data Eng 16(4):396–407CrossRef
    Spinosa EJ, Carvalho ACPLF (2004) SVMs for novel class detection in bioinformatics. In: Proceedings of III Brasilian workshop on bioinformatics (WOB 2004), BrasÃlia, pp 81–88
    Spinosa EJ, de A C P L F de Carvalho, Gama J (2008) Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM symposium on applied computing (SAC ’08), ACM, pp 976–980
    Spinosa EJ, Carvalho ACPLF, Gama J (2009) Novelty detection with application to data streams. Intell Data Anal 13(3):405–422
    Srivastava A (2006) Enabling the discovery of recurring anomalies in aerospace problem reports using high-dimensional clustering techniques. In: IEEE Aerospace conference
    Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: Proceedings of the 22th international joint conference on artificial intelligence—volume 2 (IJCAI’11), pp 1511–1516
    Tavakkoli A, Nicolescu M, Bebis G (2006) A novelty detection approach for foreground region detection in videos with quasi-stationary backgrounds. In: Proceedings of the 2nd international symposium on visual computing
    Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009) A detailed analysis of the kdd cup 99 data set. In: IEEE symposium on computational intelligence for security and defense applications, 2009. CISDA 2009, pp 1–6
    Tax DMJ, Duin RPW (2001) Combining one-class classifiers. In: Proceedings of the 2nd international workshop on multiple classifier systems (MCS ’01), pp 299–308
    Tax DMJ, Duin RPW (2008) Growing a multi-class classifier with a reject option. Pattern Recognit Lett 29(10):1565–1570CrossRef
    Ting KM, Tan SC, Liu FT (2009) Mass: a new ranking measure for anomaly detection. In: Technical report fa2386-09-1-4014, Gippsland School of Information Technology, Monash University
    Tsymbal A (2004) The problem of concept drift: definitions and related work. In: Technical report TCD-CS -2004-15, Computer Science Department, Trinity College, Dublin
    Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New YorkMATH
    Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceeding of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 226–235
    Wang W, Guan X, Zhang X (2008) Processing of massive audit data streams for real-time anomaly intrusion detection. Comput Commun 31(1):58–72CrossRef
    Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
    Yang Y, Zhang J, Carbonell J, Jin C (2002) Topic-conditioned novelty detection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’02), pp 688–693
    Yeung D, Chow C (2002) Parzen-window network intrusion detectors. In: Proceedings of the 16th international conference on pattern recognition, pp 385–388
    Yeung D, Ding Y (2003) Host-based intrusion detection using dynamic and static behavioral models. Pattern Recognit 36:229–243MATH CrossRef
    Zhang J, Yan Q, Zhang Y, Huang Z (2006) Novel fault class detection based on novelty detection methods.In: Intelligent computing in signal processing and pattern recognition. Lecture notes in control and information sciences, vol 345. Springer, Berlin, pp 982–987
  • 作者单位:Elaine R. Faria (1)
    Isabel J. C. R. Gonçalves (2)
    André C. P. L. F. de Carvalho (3)
    João Gama (4)

    1. Faculty of Computing, Federal University of Uberlândia, Uberlândia, Brazil
    2. Instituto Politécnico de Viana do Castelo, Viana do Castelo, Portugal
    3. Institute of Mathematics and Computer Science (ICMC), University of São Paulo, São Paulo, Brazil
    4. Laboratory of Artificial Intelligence and Decision Support (LIAAD-INESC TEC), University of Porto, Porto, Portugal
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Science, general
    Complexity
  • 出版者:Springer Netherlands
  • ISSN:1573-7462
文摘
In massive data analysis, data usually come in streams. In the last years, several studies have investigated novelty detection in these data streams. Different approaches have been proposed and validated in many application domains. A review of the main aspects of these studies can provide useful information to improve the performance of existing approaches, allow their adaptation to new applications and help to identify new important issues to be addresses in future studies. This article presents and analyses different aspects of novelty detection in data streams, like the offline and online phases, the number of classes considered at each phase, the use of ensemble versus a single classifier, supervised and unsupervised approaches for the learning task, information used for decision model update, forgetting mechanisms for outdated concepts, concept drift treatment, how to distinguish noise and outliers from novelty concepts, classification strategies for data with unknown label, and how to deal with recurring classes. This article also describes several applications of novelty detection in data streams investigated in the literature and discuss important challenges and future research directions.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700