A comparison of statistical spam detection techniques.
详细信息   
  • 作者:Brown ; Kevin Alan.
  • 学历:Master
  • 年:2006
  • 导师:Chandler, J. P.
  • 毕业院校:Oklahoma State University
  • 专业:Computer Science.
  • ISBN:9780542601392
  • CBH:1433591
  • Country:USA
  • 语种:English
  • FileSize:638843
  • Pages:108
文摘
Spam (unsolicited and undesirable email) has become a significant problem for email users. This study investigated the current state-of-the-art in statistical spam filtering. Established methods, inspired by the work of Paul Graham, were examined, and new techniques were introduced and tested. Tests were conducted using two private corpora of email messages and one publicly available corpus.;A base configuration of a spam filter program, similar in technique to a popular production spam filter, was implemented and tested. This configuration achieved high accuracy while maintaining a low false positive rate. One main objective of this paper was to develop a new weighted token probability function. The data contained in header fields are important, and it was believed weighting header data higher than data in the body of the message could improve accuracy. This new weighted token probability function strengthens or weakens header and phrase tokens. Weighting headers applies the weight to any token from a header field, while all body tokens are given unit weight.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700