Evaluating the performance of latent semantic indexing.

详细信息

作者：Suwannajan ; Pakinee.
学历：Doctor
年：2005
导师：Jessup, Elizabeth
毕业院校：University of Colorado
专业：Computer Science.;Mathematics.
ISBN：0542179563
CBH：3178359
Country：USA
语种：English
FileSize：2830645
Pages：119

文摘

Information Retrieval (IR) has emerged in various fields such as the Web, bibliography systems, and digital libraries. Data indexing and retrieval are parts of IR and have been of interest to computer information scientists in the past years. One of the most popular IR models is the vector space model. It was developed to solve many problems associated with exact lexical matching. The vector space model employs linear algebra tools to find the similarity between a document and a query. Latent Semantic Indexing (LSI), a widely used variant of the vector space model, was designed to overcome problems arising from synonymy and polysemy. It is often claimed in the literature that LSI outperforms the vector space model. We discovered that LSI's performance is better than that of the vector space model only in some cases, specifically when the amount of information that a query shares with the relevant documents is greater than the amount that that query shares with the non-relevant documents. We also studied the capability of LSI in solving synonymy and polysemy problems. While synonyms are words that have the same meaning, a polyseme is a single word that has multiple meanings. We discovered that LSI can distinguish between two synonymous words only when they both appear in the same or similar contexts. For polysemy, LSI outperforms the vector space model only when two contexts that use different meanings of a polyseme share at least some information.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700