Mining massive data streams.
详细信息   
  • 作者:Hulten ; Geoffrey.
  • 学历:Doctor
  • 年:2005
  • 导师:Domingos, Pedro
  • 毕业院校:University of Washington
  • 专业:Computer Science.
  • ISBN:9780542176296
  • CBH:3178083
  • Country:USA
  • 语种:English
  • FileSize:8924581
  • Pages:168
文摘
Many organizations today have more than very large databases; they have databases that grow without limit at a rate of several million records per day. Mining these continuous data streams brings unique opportunities, but also new challenges. In this thesis we develop a method that can semi-automatically enhance a wide class of existing learning algorithms so that they can learn from such high-speed data streams in real time. In particular, our method can be applied to essentially any induction algorithm based on discrete search. After applying our method the algorithm: learns from data-streams in an incremental, any-time fashion; runs in time independent of the amount of data seen, while making decisions that are essentially identical to those that would be made from infinite data; uses a constant amount of RAM no matter how much data it sees; and adjusts its learned models in a very fine-grained manner as the data generating process changes over time. We evaluate our method by using it to produce a series of learning algorithms---for decision trees, Bayesian network structure, and clustering---which are all capable of learning from high-speed data streams. We evaluate these learners with extensive studies on synthetic data sets, and by applying them to a collection of massive real-world mining tasks.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700