A variational approach towards distributed data mining.
详细信息   
  • 作者:Mukherjee ; Sourav.
  • 学历:Master
  • 年:2007
  • 导师:Kargupta, Hillol
  • 毕业院校:University of Maryland
  • 专业:Computer Science.
  • ISBN:9780549407096
  • CBH:1451476
  • Country:USA
  • 语种:English
  • FileSize:450580
  • Pages:88
文摘
This thesis presents a general framework for applying the variational approximation technique to problems in Distributed Data Mining. Distributed Data Mining aims at analyzing distributed data in order to extract useful information, while paying attention to computational cost, communication cost, storage requirements, and human-computer interaction. This thesis shows that the variational method is a deterministic approximation technique that can be applied to formulate communication-efficient and scalable solutions to Distributed Data Mining problems. As an illustration, two important problems in Distributed Data Mining are chosen: Distributed Probabilistic Inferencing in Graphical Models, and Distributed Linear Regression over Vertically Partitioned Data. In both cases, analytical results have been derived to demonstrate that the variational method leads to communication-efficient and scalable solutions. These claims are validated by experimental results. From a performance point of view, the results show that sufficiently accurate results can be achieved even with modest communication-bandwidth allowances. The results also indicate that the variational techniques are highly scalable. Thus, the variational approximation framework is successfully established as a framework suitable for formulating efficient solutions to Distributed Data Mining problems.;The thesis is organized as follows: Chapter 1 provides an introduction and motivation to Distributed Data Mining, discussing its importance and the challenges it poses. The chapter also presents a brief introduction to the basic ideas of the variational approximation technique. It then reviews the existing literature relevant to Distributed Data Mining and variational methods, and finally, enumerates the contributions of this thesis. The subsequent chapters consider concrete problems in Distributed Data Mining, and apply the variational method to solve them.;Chapter 2 considers the problem of Distributed Probabilistic Inferencing in a Graphical Model. It presents an algorithm, VIDE (Variational Inferencing in Distributed Environments) that achieves considerable accuracy, while using much less communication than would be required for a complete centralization of data.;Chapter 3 considers another problem: that of Linear Regression in a Heterogeneously Distributed Environment. The variational method is applied to formulate algorithms both for learning the linear model, and for using it for predictive modeling. In this case, too, the variational algorithms are shown to be much more communication-efficient than the techniques that rely on full centralization of data.;Finally, Chapter 4 concludes this thesis.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700