Evaluating the performance of latent semantic indexing.
详细信息   
  • 作者:Suwannajan ; Pakinee.
  • 学历:Doctor
  • 年:2005
  • 导师:Jessup, Elizabeth
  • 毕业院校:University of Colorado
  • 专业:Computer Science.;Mathematics.
  • ISBN:0542179563
  • CBH:3178359
  • Country:USA
  • 语种:English
  • FileSize:2830645
  • Pages:119
文摘
Information Retrieval (IR) has emerged in various fields such as the Web, bibliography systems, and digital libraries. Data indexing and retrieval are parts of IR and have been of interest to computer information scientists in the past years. One of the most popular IR models is the vector space model. It was developed to solve many problems associated with exact lexical matching. The vector space model employs linear algebra tools to find the similarity between a document and a query. Latent Semantic Indexing (LSI), a widely used variant of the vector space model, was designed to overcome problems arising from synonymy and polysemy. It is often claimed in the literature that LSI outperforms the vector space model. We discovered that LSI's performance is better than that of the vector space model only in some cases, specifically when the amount of information that a query shares with the relevant documents is greater than the amount that that query shares with the non-relevant documents. We also studied the capability of LSI in solving synonymy and polysemy problems. While synonyms are words that have the same meaning, a polyseme is a single word that has multiple meanings. We discovered that LSI can distinguish between two synonymous words only when they both appear in the same or similar contexts. For polysemy, LSI outperforms the vector space model only when two contexts that use different meanings of a polyseme share at least some information.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700