不确定数据流中频繁数据挖掘研究

英文题名：Study on Frequent Data Mining from Uncertain Data Streams
作者：汤克明
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：不确定数据流 ; 频繁项查询 ; Top-k查询 ; 频繁闭项集挖掘 ; 频繁数量区间模式挖掘 ; 滑动窗口模型 ; 采样挖掘模型
英文关键词：uncertain data streams ; frequent item queries ; top-k queries ; frequent closed itemset
英文关键词：mining ; frequent quantitative interval pattern mining ; sliding window model ; sample
英文关键词：mining model
学位年度：2012
导师：陈崚
学科代码：081203
学位授予单位：南京航空航天大学
论文提交日期：2012-06-01

摘要

随着计算机技术与通信技术的快速发展，传感器网络、Web服务和RFID技术得到了广泛应用，从而使得不确定性数据管理得到广泛的重视．在许多现实的应用中，例如经济形势预测、金融信息分析、生态环境监测、网络安全监控、物流管理等等，不确定数据流扮演着关键角色．在这些应用中，传统的数据管理技术却无法有效地管理新型的不确定数据流，这就引发了学术界和工业界对研发新型的不确定数据流管理技术的兴趣．因此，不确定数据流上的数据挖掘已经成为当前数据挖掘领域的研究热点．
     当前对于不确定数据流上的挖掘主要集中在不确定数据流上的聚类、不确定数据流上的频繁模式挖掘、Skyline查询、数据世系分析、异常分析等．本文在深入研究国内外的各种不确定数据流挖掘技术的基础上，讨论了目前国内外有关不确定数据流频繁数据挖掘的研究现状．由于不确定数据流上的频繁数据挖掘是不确定数据流上的关联规则、分类、聚类等挖掘的基础，在不确定数据流挖掘中具有重要的地位．因此，本文在不确定数据流上频繁数据挖掘方面进行了深入的研究，提出了有效的频繁数据挖掘算法．本文的主要工作有：
     （1）提出了一种基于滑动窗口的不确定数据流中频繁项查询算法SWBUFIM．本文根据频繁项的本质特性以及马尔科夫不等式，给出了两个裁剪规则，用于对不确定数据流进行预处理，裁剪掉不可能成为频繁项的元组．在此基础上我们：一方面利用动态规划方法计算期望概率，保证在时间内完成期望概率的计算；另一方面，根据不同数据项相互独立性原理，针对不同数据项开辟子滑动窗口，并且根据数据项的组合数目进行行列划分来处理频繁项挖掘问题，并在动态规划方法的基础上，进一步改进期望概率计算方法，只需要动态规划滑动窗口中前k-1项即可保证在时间内有效地完成期望概率的计算．实验结果表明，所提出的查询算法SWBUFIM具有较快的处理速度，其空间复杂度随着处理数据规模的增加成线性增长．
     （2）提出了一种基于滑动窗口的不确定数据流中top-k查询算法MPTopKTS．本文针对top-k查询的定义，根据不确定数据流及其滑动窗口的特性，研究基于滑动窗口top-k查询问题，提出了所有可能世界中元组集成员相对得分值高并且具有最大出现概率的top-k元组集(MPTopKTS)的查询算法．该算法基于滑动窗口建立概要表，然后在每一时刻对概要表进行修改，有效地减少了top-k查询问题的复杂性；能够在查询准确性与查询开销之间取得平衡，较小的计算开销获得高质量的近似结果．实验结果表明，所提出的查询算法在时间与空间复杂性方面优于其他类似的算法．
     （3）提出一种基于滑动窗口的不确定数据流中频繁闭项集的采样挖掘算法MFCIFUDS．本文针对不确定数据流频繁闭项集的挖掘问题，首先使用采样的方法，基于随机采样概率，把由不确定数据组成的事务转换成由确定性数据组成的事务，再利用基于确定性数据模型的频繁闭项集挖掘技术完成不确定数据流中频繁闭项集的挖掘任务．本文不但从理论上证明了基于采样技术利用确定性数据挖掘算法解决不确定数据挖掘问题的可行性，而且提出了一种改进频繁模式树生成与修改技术，有效地提高了基于FP-tree频繁模式树的频繁闭项集挖掘速度．实验结果表明，所提出的查询算法MFCIFUDS有较高的挖掘精度和处理速度．
     （4）提出了一种基于滑动窗口的不确定数据流中频繁数量区间模式的挖掘算法MFIPatFUS．不同于处理常规二进制项集事务不确定数据流，数量区间事务不确定数据流使用数量区间来表示事务属性，其不确定性在于属性数量区间范围的波动性，数量区间分布体现某种分布概率．本文借鉴常规的基于频繁模式树的不确定数据流频繁模式挖掘算法，设计一种频繁数量区间模式生成树FIPatTree，用于捕获不确定数据流中所有事务的数量区间信息．我们把原始数量区间边界值作为基元素，根据基元素的分布情况建立基数量区间，从而一方面基于基数量区间对原始数量区间进行重新划分；另一方面根据基数量区间数值范围在原始数量区间中所占比例决定其基数量区间概率．算法MFIPatFUS采用滑动窗口模型，使用FIPatTree树作为概要数据结构，事务属性以基数量区间结点保存在FIPatTree树中．建立树的过程类似常规频繁模式生成树的建立过程，不同点在于当属性基数量区间与出现概率均相同时，结点方可共享．对于共享结点设立频次与局部概率统计数值，为了方便遍历与修改，增设了与FIPatTree树相关联的属性索引与基数量区间索引．基于频繁数量区间模式生成树FIPatTree的频繁数量区间模式挖掘过程采用基于投影基与条件树的递归挖掘方法．实验结果表明，所提出滑动窗口模型的挖掘算法MFIPatFUS对处理数量区间事务组成的不确定数据流频繁数量区间模式挖掘是有效的．
With the rapid development of computer and communication technology and wide application ofwireless sensor networks, Web service and RFID, uncertain data management has gained a lot ofattention. Uncertain data management plays an important role in many practical scenarios, such aseconomy situation prediction, financial information analysis, ecological environment observing,network security monitoring, logistics management, etc. But the traditional data managementtechnology cannot handle such new type of uncertain data streams effectively. Therefore, designingnew management technology for uncertain data streams draws significant interest from industry andacademia. Thus, the data mining on the uncertain data streams has already become a research hotpot.
     The data mining on the uncertain data streams mainly focus on the clustering, frequent patternmining, skyline query, data lineage analysis, outlier analysis. Based on the thorough understanding ofrelated work, this paper discusses the state-of-the-art of the frequent data mining of the uncertain datastreams. As the basis of several mining tasks such as association rule, classification and clustering,frequent data mining had an important position in the uncertain data streams mining. Therefore, thispaper investigates the frequent data mining on the uncertain data streams deeply and proposeseffective algorithms for frequent data mining algorithms. The main contributions of the paper are asfollows:
     (1) We propose a sliding window-based algorithm SWBUFIM for frequent item query on theuncertain data streams. According to the characteristics of frequent item and Markov's inequality, wegave two prunning rules to omit the items which cannot become frequent on the uncertain datastreams. We employ the dynamic programming method to compute expected probability intime. Since different data items are mutually independent, we present the model which will open subsliding window for different data item and handle the frequent item mining problem by the processingdivisions according to the number of combinations. In addition to using the dynamic programmingmethod, we improve the probability computing algorithm to efficiently compute the expectedprobability in time by only processing k-1rows in the sliding window.
     (2) We propose a sliding window-based top-k query algorithm MPTopKTS for the uncertain datastreams．According to the characteristics of uncertain data streams along with its sliding window, weinvestigate the top-k query problem and propose the query algorithm for the Top-k tuple sets with therelatively high score value and maximal probability of occurrence. To reduce the time complexity oftop-k query algorithm, this algorithm builds synopsis tables based on sliding window. We alsoadvanced a effective method to update the the synopsis tables in each time step. This algorithm canalso balance between the query accuracy and the time cost, namely, it can gain the high qualityapproximate result at the price of minimal computing overhead. The experiment results demonstratethat our query algorithm is more efficient than the previous work in time and space complexity.
     (3) We propose a sample mining algorithm MFCIFUDS for detecting the frequent closeditemsets in uncertain data streams based on sliding window．Based on the character of uncertain datastreams frequent closed itemsets, we first transfer the transactions comprised by uncertain data intothe one with certain data by random sampling, and then detect the frequent closed itemsets byemploying the mining technology for the certain data streams. We theoretically prove the feasibility of the algorithm based on the sample technology. In addition, we also propose an improved algorithm forconstructing and updating the frequent pattern tree to speed up the frequent closed itemsets mining.The experiment results demonstrate that our query algorithm, MFCIFUDS has high mining accuracyand proceeding speed.
     (4) We propose a frequent quantitative interval pattern mining algorithm MFIPatFUS foruncertain data streams based on sliding window. Instead of the regular binary item set transactionuncertain data streams, in the quantitative interval transaction uncertain data streams value of eachtransaction attribute is a quantitative interval. Here, the quantitative interval’s uncertainty is reflectedby the fluctuation of its range, and its distribution demonstrate some type of distribution probability.Utilizing the basic idea of uncertain data streams frequent pattern mining algorithm based on regularfrequent pattern spanning tree, we present an algorithm based on a frequent quantitative intervalpattern spanning tree FIPatTree which is used to capture the quantitative interval information in all ofthe uncertain data streams transactions. FIPatTree uses the boundary values of the initial interval asbase elements, and then establishes quantitative intervals according to the base element distribution.On the other hand, the probability of each base quantitative interval can be computed by theproportion of base numerical range in the original numerical range. Algorithm MFIPatFUS employsthe sliding window model and utilizes FIPatTree as the synoptic data structure. The transactionattributes are stored in FIPatTree where each node represents a base quantitative interval. Theconstruction of tree is similar to that of the regular frequent pattern spanning tree with the exceptionthat nodes can share only when base quantitative intervals and occurrence probabilities of theattributes are identical. We set frequency and partial probability statistics for the shared nodes. Wealso set indexes of the attributes and the base quantitative intervals on the FIPatTree so as to updateand traverse on the FIPatTree. The experiment results demonstrate that the sliding window modelbased mining algorithm MFIPatFUS can effectively detect the frequent quantitative interval patternmining for the uncertain data streams with quantitative interval transactions.

引文

[1] W. J. Frawley, G. P. Shapiro G, C. J. Matheus. Knowledge discovery in databases: anoverview. Knowledge discovery in databases, Cambridge, MA:AAAI/MIT Press,1991:1~30.
    [2] J. Han, M. Kamber.数据挖掘:概念与技术(第２版).北京:机械工业出版社,2011:441~446.
    [3] R. Prabhakar, R. Cheng. Data uncertainty management in sensor networks. Encyclopedia ofdatabase systems, New York, NY:springer,2009:647~651.
    [4] D. Suciu. Probabilistic databases. Encyclopedia of database systems, New York, NY:Springer,2009:2150~2155.
    [5] S. Wasserkrug. Uncertainty in events. Encyclopedia of database systems, New York,NY:Springer,2009:3221~3225.
    [6] N. Dalvi. Uncertainty management in scientific database systems. Encyclopedia of databasesystems, New York, NY: Springer,2009:3225~3231.
    [7]周傲英,金澈清,王国仁,等.不确定性数据管理技术研究综述.计算机学报,2009,32(1):1~16.
    [8]王国仁,袁野.不确定数据管理技术的研究进展与趋势.2010中国计算机科学技术发展报告,北京:机械工业出版社,2011:167~185.
    [9] H. P. Kriegel, M. Pfeifle. Density-based clustering of uncertain data. In proceedings of theknowledge discovery and data mining, New York, NY:ACM Press,2005:672~677.
    [10] G. Cormode, A. M. Gregor. Approximation algorithms for clustering uncertain data. Inproceedings of the ACM symposium on principles of database systems, New York, NY:ACMPress,2008:191~200.
    [11] B. Kao, S. D. Lee, F. K. F. Lee, et al. Clustering uncertain data using voronoi diagrams andr-tree index. IEEE transactions on knowledge and data engineering,2010:1219~1233.
    [12] J. Ren, S. D. Lee, X. Chen, et al. Naive bayes classification of uncertain data. In proceedingsof the IEEE international conference on data engineering, Los Alamitos, CA:IEEE ComputerSociety,2009:944~949.
    [13] B. Qin, Y. Xia, F. Li. A bayesian classifier for uncertain data. In proceedings of the ACMsymposium on applied computing, New York, NY:ACM Press,2010:1010~1014.
    [14] C. C. Aggarwal, P. S. Yu. Outlier detection with uncertain data. In proceedings of theinternational conference on data mining, Philadelphia, PA:SIAM,2008:483~493.
    [15] R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in largedatabases. Proceedings of the ACM international conference on management Of data,1993,22(2):207~216.
    [16] J. Pei. Association rules. Encyclopedia of database systems, New York, NY:Springer,2009:140~141.
    [17] H. Cheng, J. Han. Frequent itemsets and association rules. Encyclopedia of database systems,New York, NY:Springer,2009:1184~1187.
    [18] S. Abiteboul, P. Kanellak, G. Grahne. On the representation and querying of sets of possibleworlds. In proceedings of the ACM international conference on management of data,1987,16(3):34~48.
    [19] T. J. Green, V. Tannen. Models for incomplete and probabilistic information. In proceedingsof the IEEE international conference on extending database technology, Munich, Germany,2006,29(1):278~296.
    [20] F. Sadri. Modeling uncertainty in databases. In proceedings of the seventh internationalconference on data engineering, Washington: IEEE computer society,1991:122~131.
    [21] C. C. Aggarwal, P. S. Yu. A survery of uncertain data algorithms and applications. IEEEtransactions on knowledge and data engineering,2009,21(5):609~623.
    [22] N. Fuhr, T. Rolleke. A probabilistic relational algebra for the integration of informationretrieval and database systems. ACM tansactions on information systems,1997,15(1):32~66.
    [23] N. Dalvi, D. Suciu. The dichotomy of conjunctive queries on probabilistic structures. Inproceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of databasesystems, Beijing,2007:293~302.
    [24] M. A. Soliman, I. F. Ilyas, K. C. C. Chang. Top-k query processing in uncertain databases. Inproceedings of the international conference of data engineering,2007:896~905.
    [25] M. Hua, J. Pei, W. J. Zhang, et al. Efficiently answering probabilistic threshold top-k querieson uncertain data. In proceedings of the24thIEEE international conference on dataengineering,2008:1403~1405.
    [26] C. Jin, K. Yi, L. Chen, et al. Sliding-window top-k queries on uncertain streams. Inproceedings of the international conference on very large data bases,2008:302~312.
    [27] N. Dalvi, D. Suciu. Efficient query evaluation on probabilistic databases[C]. In proceedingsof the international conference on very large data bases,2004:864~875.
    [28] C. Graham, M. G. Andrew. Approximation algorithms for clustering uncertain data. Inproceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium onprinciples of database systems,2008:191~199.
    [29] C. K. S. Leung, J. Fan, H. Yaroslav. A landmark-model based system for mining frequentpatterns from uncertain data streams. In proceedings of the15th symposium on internationaldatabase engineering&applications,2011:249~250.
    [30] P. Manish, S. Rahul, V. T. Sharma, et al. A truly dynamic data structure for top-k queries onuncertain data. In proceedings of the23rd international conference on scientific andstatistical database management,2011:91~108.
    [31] J. A. Mikhail, Y. N. Qi, H. Yuan. Asymptotically efficient algorithms for skylineprobabilities of uncertain data. ACM Transaction database system,2011,36(2):1~28.
    [32] J. Letchner, M. Balazinska. Lineage for markovian stream event queries. In Proceedings ofthe10th ACM international workshop on data engineering for wireless and mobile access,2011:26~33.
    [33] B. Wang, X. C. Yang, G. R. Wang, et al. Outlier detection over sliding windows forprobabilistic data stream. Journal of computer science and technology,2010,25(3):389~400.
    [34]朱扬勇,熊赟.数据学[M].上海:复旦大学出版社,2009.
    [35] C. K. Chui, B. Kao, E. Hung. Mining frequent itemsets from uncertain data. Lecture notes oncomputer science, advances in knowledge discovery and data mining,2008,5012/2008:47~58.
    [36] R. Agrawal, R. Srikant. Fast algorithms for mining association rules in large databases. Inproceedings of the20th international conference on very large data bases,1994:487~499.
    [37] R. Agrawal, H. Mannila, R. Srikant, et al. Verkamo. Fast discovery of association rules.Advances in knowledge discovery and data mining,1996:307~328.
    [38] B. Goethals. Apriori property and breadth-first search algorithms. Encyclopedia of databasesystems, New York, NY:Springer,2009:124~127.
    [39] C. K. Chui, B. Kao. A decremental approach for mining frequent itemsets from uncertaindata. In proceedings of the pacific-asia conference on knowledge discovery and data mining,2008:64~75.
    [40] C. K. S. Leung, C. L. Carmichael, B. Hao. Efficient mining of frequent patterns fromuncertain data. In proceedings of the IEEE international conference on data engineering,2007:489~494.
    [41] J. Han, J. Pei, Y. W. Yin. Mining frequent patterns without candidate generation. InProceeding of the ACM international conference on management of data,2000:1~12.
    [42] J. Han, J. Pei, Y. W. Yin, et al. Mining frequent patterns without candidate generation: afrequent-pattern tree approach. Data mining and knowledge discovery,2004,8:53~87.
    [43] C. K. S. Leung, M A. F. Mateo, D. A. Brajczuk. A tree-based approach for frequent patternmining from uncertain data. In proceedings of the pacific-asia conference on knowledgediscovery and data mining,2008:653~661.
    [44] C. K. S. Leung. Frequent itemset mining with constraints. Encyclopedia of database systems,New York:Springer,2009:1179~1183.
    [45] R. T. Ng, L. V. S. Lakshmanan, J. Han, A. Pang. Exploratory mining and pruningoptimizations of constrained associations rules. In proceeding of t the ACM internationalconference on management of data,1998:13~24.
    [46] L. V. S. Lakshmanan, C. K. S. Leung, R. T. Ng. Efficient dynamic mining of constrainedfrequent sets. ACM tansactions on database system,2003,28:337~389.
    [47] C. K. S. Leung, D. A. Brajczuk. Efficient algorithms for mining constrained frequent patternsfrom uncertain data. In proceeding of the ACM international conference on management ofdata, workshop on knowledge discovery from uncertain data,2009:9~18.
    [48] C. K. S. Leung, D. A. Brajczuk. Efficient algorithms for the mining of constrained frequentpatterns from uncertain data. In proceeding of the ACM special interest group on knowledgediscovery and data mining explorations, Newsletter,2009,11(2):123~130.
    [49] C. K. S. Leung, B. Hao, D. A. Brajczuk. Mining uncertain data for frequent itemsets thatsatisfy aggregate constraints. In proceedings of the ACM symposium on applied computing,2010:1034~1038.
    [50] C. K. S. Leung. Succinct constraints. Encyclopedia of database systems. New York:Springer,2009:2876.
    [51] C. K. S. Leung, L. V. S. Lakshmanan, R. T. Ng. Exploiting succinct constraints usingFP-trees. In proceeding of the ACM special interest group on knowledge discovery and datamining explorations,2002,4:40~49.
    [52] C. K. S. Leung. Convertible constraints. Encyclopedia of database systems, NewYork:Springer,2009:494~495.
    [53] J. Pei, J. Han, L. V. S. Lakshmanan. Mining frequent itemsets with convertible constraints. Inproceedings of the IEEE international conference on data engineering,2001:433~442.
    [54] J. Pei, J. Han, L. V. S. Lakshmanan. Pushing convertible constraints in frequent itemsetmining. Data mining and knowledge discovery,2004,8:227~252.
    [55] C. Giannella, J. Han, J. Pei, et al. Mining frequent patterns in data streams at multiple timegranularities. Data mining: next generation challenges and future directions, CA:AAAI/MITPress,2004:105~124.
    [56] M. M. Gaber, A. B. Zaslavsky, S. Krishnaswamy. Mining data streams: a review. Inproceeding of the ACM international conference on management of data,2005,34:18~26.
    [57] C. k. S. Leung, Q. I. Khan. DSTree: a tree structure for the mining of frequent sets from datastreams. In proceedings of the IEEE international conference on data engineering,2006:928~933.
    [58] G. Cormode, M. Hadjieleftheriou. Finding frequent items in data streams. In proceedings ofthe international conference on very large data bases,2008:1530~1541.
    [59] P. S. Yu, Y. Chi. Association rule mining on streams. Encyclopedia of database systems. NewYork:Springer,2009:136~140.
    [60] A. Metwally. Frequent items on streams. Encyclopedia of database systems. New York:Springer,2009:1175~1179.
    [61] J. Han, B. Ding. Stream mining. Encyclopedia of Database Systems. New York:Springer,2009:2831~2834.
    [62] C. K. S. Leung, B. Hao. Mining of frequent itemsets from streams of uncertain data. Inproceedings of the IEEE international conference on data engineering,2009:1663~1670.
    [63] J. Pei, J. W. Han, H. Lu, et al. H-mine: hyper-structure mining of frequent patterns in largedatabases. In Proceedings of the IEEE international conference on data mining,2001:441~448.
    [64] J. Pei, J. W. Han, H. Lu, et al. H-mine: fast and space-preserving frequent pattern mining inlarge databases. IIE Transactions,2007,39(6):593~605.
    [65] C. C. Aggarwal, Y. Li, J. Wang. Frequent pattern mining with uncertain data. In proceedingsof the International conference on knowledge discovery and data mining,2009:29~37.
    [66] C. C. Aggarwal, Y. Li, J. Wang. Frequent pattern mining algorithms with uncertain data.Managing and mining uncertain data, the kluwer international series on advances in databasesystems, New York:Springer,2009,35:1~33.
    [67] T. Calders, C. Garboni, B. Goethals. Efficient pattern mining of uncertain data with sampling.In proceedings of the pacific-asia conference on advances in knowledge discovery and datamining,2010:480~487.
    [68] M. J. Zaki, S. Parthasarathy, M. Ogihara, et al. New algorithms for fast discovery ofassociation rules. In proceedings of the international conference on knowledge discovery anddata mining,1997:283~286.
    [69] Q. Zhang, F. F. Li, K. Yi. Finding frequent items in probabilistic data. In Proceedings of theACM international conference on management of data,2008:819~832.
    [70] N. Dalvi, D. Suciu. Efficient query evaluation on probabilistic databases. In Proceedings ofthe international conference on very large data bases,2004:864~875.
    [71] T. Bernecker, H. P. Kriegel, M. Renz, et al. Probabilistic frequent itemset mining in uncertaindatabases. In proceedings of the international conference on knowledge discovery and datamining,2009:119~127.
    [72] L. Sun, R. Cheng, D. W. Cheung, et al. Mining uncertain data with probabilistic guarantees.In proceedings of the international conference on knowledge discovery and data mining,2010:273~282.
    [73] J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. Inproceedings of the international conference on innovative data systems research,2005:262~276.
    [74] J. Huang, L. Antova, C. D. Olteanu. MayBMS: A probabilistic database management system.In proceeding of the ACM SIGMOD,2008:673~686.
    [75] R. Cheng, D. V. Kalashnikov, S. Prabhakar. Evaluating probabilistic queries over imprecisedata. In proceeding of the ACM international conference on management of data,2003:551~562.
    [76] P. Sen, A. Deshpande, L. Getoor. Prdb: managing and exploiting rich correlation onprobabilistic database. Journal on very large data bases,2009,18(5):1065~1090.
    [77] G. Cormode, F. Li, K. Yi. Semantics of ranking queries for probabilistic data and expectranks. In proceedings of the IEEE international conference on data engineering,2009:305~316.
    [78] J. Jestes, G. Cormode, F. Li, et al. Semantics of ranking queries for probabilistic data. IEEEtransactions on knowledge and data engineering,2011,23(12):1903~1917.
    [79] J. Li, B. Saha, A. Deshpande. A unified approach to ranking in probabilistic database. Journalon very large data bases,2011,20(2):249~275.
    [80] K. Yi, F. Li, G. Kollios, et al. Efficient processing of top-k queries in uncertain database. Inproceedings of the IEEE international conference on data engineering,2008:1406~1408.
    [81] M. Hua, J. Pei, W. Zhang, et al. Ranking queries on uncertain data: a probabilistic thresholdapproach. In proceedings of the ACM international conference on management of data,2008:673~686.
    [82] J. Chen, K. Yi. Dynamic structures for top-k queries on uncertain data. In proceedings of the18th international conference on algorithms and computation,2007:427~438.
    [83] Y. Yuan, G. Wang, Y. Sun. An efficient P2P range query processing approach formulti-dimensional uncertian data. In porceedings of the14th international conference ondatabase systems for advanced applications,2009,304~319.
    [84] F. Li, K. Yi, J. Jestes. Ranking distributed probabilistic data. In proceedings of theinternational conference on management of data,2009:144~156.
    [85]孙永佼,袁野,王国仁. P2P环境下面向不确定数据的Top-k查询.计算机学报,2011,34(11):2155~2164.
    [86] C.Estan, G. Varghese. New directions in traffic measurement and accounting. In proceedingsof the ACM SIGCOMM international conference on data communication,2002:323~336.
    [87] J. W. Han, J. Pei, G. Z. Dong, et al. Efficient computation of iceberg cubes with complexmeasures. In proceedings of the ACM international conference on management of data,2001:1~12.
    [88] G. Cormode, S. Muthukrishnan. Diamond in the rough: finding hierarchical heavy hitters inmulti-dimensional data. In proceedings of the23rd ACM international conference onmanagement of data,2004:155~166.
    [89] G. Cormode, S. Muthukrishnan. What’s hot and what’s not: tracking most frequent itemsdynamically. In proceedings of the symposium on principles of database systems,2005:249~278.
    [90] E. D. Demaine, A. L. Ortiz, J. I. Munro. Frequency estimation of internet packet streamswith limited space. In proceedings of the10th annual european symposium on algorithms,2002:348~360.
    [91] H. Y. Liu, Y. Lin, J. W. Han. Methods for mining frequent items in data streams: an overview.Knowledge and information systems,2011,26(1):1~30.
    [92] P. S. Vitter. Random sampling with a reservoir. ACM transactions on mathematical software,1985,11(1):37~57.
    [93] P. B. Gibbons, Y. Matias. New sampling-based summary statistics for improving approximatequery answers. In proceedings of the ACM international conference on management of data,1998:331~342.
    [94] G. Cormode, M. Garofalakis. Sketching probabilistic data streams. In proceedings of ACMinternational conference on management of data,2007:281~292.
    [95] S. Wang, G. R. Wang, J. T. Chen. Distributed frequent items detection on uncertain data.Lecture Notes in Computer Science,2010:509~520.
    [96]王爽,杨广明,朱志良.基于不确定数据的频繁项查询算法.东北大学学报（自然科学版）,2011,32(3):344~347.
    [97] National Snow and Ice Data Center. International Ice Patrol (IIP) iceberg sightings database.http://nsidc.org/data/g00807.html,2011-10-20.
    [98] B. Babcock, S. Babu, M. Datar, et al. Models and issues in data stream systems. InProceedings of the21stACM SIGMOD-SIGACT-SIGART symposium on principles ofdatabase system,2002:1~16.
    [99] H. Parisa, M. Sebastian, A. Karl. Evaluating top-k queries over incomplete data streams. InProceedings of the18th ACM conference on information and knowledge management,2009:877~886.
    [100] H. Kawashima, H. Kitagawa, X. Li. Complex event processing over uncertain data streams.In proceedings of the international conference on P2P, parallel, grid, cloud and Internetcomputing,2010:521~526.
    [101] C. K. S. Leung, B. Y. Hao, F. Jiang. Constained frequent itemset mining from uncertain datastreams. In proceedings of the IEEE26thinternational conference on data engineeringworkshops,2010:120~127.
    [102] S. Wang, G. R. Wang, X. X. Gao, et al. Frequent items computation over uncertain wirelesssensor network. In proceedings of the ninth international conference on hybrid intelligentsystem,2009:223~228.
    [103] Y. Zhang, W. J. Zhang, X. M. Lin, et al. Ranking uncertain sky: The probabilistic top-kskyline operator. Information system,2011,36:898~915.
    [104] L. A. Abd-Elmegid, M. E. El-Sharkawi, L. M. El-Fangary, et al. Vertical mining of frequentpatterns from uncertain data. Computer and information science,2010,3(2):171~179.
    [105] M. J. Zaki, K. Gouda. Fast vertical mining using diffsets. In Proceedings of the ninth ACMinternational conference on knowledge discovery and data mining,2003:326~335.
    [106] Frequent Itemset Mining Implementations Repository. http://fimi.ua.ac.be/data/,2011-11-5.
    [107] C. K. S. Leung. Mining uncertain data. Data mining and knowledge discovery,2011,1(4):316~329.
    [108] C. K. S. Leung, D. A. Brajczuk. Mining uncertain data for constrained frequent sets. Inproceedings of the international database engineering&applications symposium,2009:109~120.
    [109] C. K. S. Leung, D. A. Brajczuk. uCFS2: an enhanced system that mines uncertain data forconstrained frequent sets. In proceedings of the international database engineering&applications symposium,2010:32~37.
    [110] Y. H. Liu. Mining frequent patterns from univarate uncertain data. Data&knowledgeengineering,2012,71(1):47~68.
    [111]中国地面国际交换站气候资料日值数据集(1951-2012年)：北京市数据集(1980-2012年).http://cdc.cma.gov.cn/shuju/index3.jsp?tpcat=SURF&dsid=SURF_CLI_CHN_MUL_DAY_CES,2012-3-5.
    [112] C. Ge, J. Letchner, M. Balazinska, et al. Event queries on correlated probabilistic streams. Inproceedings of ACM international conference on management of data,2008:34~45.
    [113] B. Y. Hao. Mining Frequent Itemsets from Uncertain Data：Extensions to Constrained Miningand Stream Mining,[thesis for the degree of Master of Science]. Manitoba, Canada:Department of Computer Science The University of Manitoba,2010.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700