Visual-based web page analysis.
详细信息   
  • 作者:Lee ; Kuang-Yao.
  • 学历:M.S.
  • 年:2014
  • 毕业院校:San Diego State University
  • Department:Computer Science
  • ISBN:9781303884436
  • CBH:1555616
  • Country:USA
  • 语种:English
  • FileSize:2974491
  • Pages:47
文摘
This research investigates efforts to identify different content areas appearing on a webpage by comparing the visual features and the relative characteristics of each content area,called visual block in this study. The process is to use the Image Segmentation technique to extract and parse a webpages visual features,as well as analyze it to identify the functionality of each content area based on its layout and position. To accomplish this,this study reviews several techniques that have been used in related fields and discusses the strengths and the weaknesses of these techniques. The main weakness for the past techniques is they rely heavily on HTML; in other words,they are language-dependent. This paper proposes a visual-based technique that focuses on using visual features rather than HTML; hence it is more language-independent. To determine the functionality of each visual block,the technique uses an algorithm to parse webpages into a tree structure and apply a rule of how humans determine the relationship between two objects on a 2D monitor. The goal of this research is to design an automated visual-based algorithm to exam each visual block showing on the webpage and apply human cognitive processes to decide the role of each block. For example,one might wish to identify the main content,the sub content,the navigation menu,and the advertisement. Chapter 1 describes the motivation,the issue,and possible solution to the problem. Chapter 2 reviews several different technologies that can be used to solve the problem and elucidates possible future research. Chapter 3 focuses on explaining how to prepare the test environment and techniques that have been used. Chapter 4 describes the result,what was accomplished,what was missing,and necessary further research. Chapter 5 concludes with the possibilities of this research and how future research might help accomplish the final goal of this research.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700