个性化Web搜索系统研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网Internet的飞速发展,网络上的信息呈指数增长。如何能够更有效、更准确地找到自己感兴趣的内容,关系到我们能否充分利用这个巨大的信息资源,这已成为基于Internet的网络信息获取的热点问题,也是本文的研究目标。
    本文首先分析了网页(Web)上信息搜索的基本问题,包括:信息检索和信息过滤的关系;信息过滤系统的特点及分类;信息过滤的方式;过滤模型、相关性测算算法及搜索系统的性能评价指标等。从总体上对搜索系统进行一定的论述,指出Web搜索系统中现存的问题。
    针对检索系统中存在的部分问题,在现有信息过滤技术的基础上,将基于内容的过滤方式与协作的过滤方式相结合,提出了一个信息过滤系统框架。本文的主要研究内容为以下三个部分:信息检索与信息过滤;混合式过滤算法;相对查准率和相对查全率。
    文中首先提出个性化搜索引擎模型框架,并给出工作原理,在元搜索引擎的基础上,把信息过滤技术引入到搜索引擎中,实现搜索引擎的智能化和个性化服务。
    相关度测算函数是信息过滤系统中核心的部分,它决定系统预测的精度,本文将一种基于内容的过滤算法和一种协作的过滤算法相结合,设计出一种新的混合式过滤算法。
    在海量的互联网信息检索上,高的查全率带来的成千上万个“命中网页”对用户实在是一个沉重的负担,所以在海量的互联网信息检索上,用查全率与查准率来衡量检索效果不是很合适,为此本文提出了相对查全率和相对查准率两个全新的衡量指标。
With the rapid development of Internet, the information on it increases inexponent. How to search out one's most interested contents demands irrelevantto users decides on whether we can utilize the huge information resources. It hasbeen the focus question of searching information on the Web, and it is also theresearch goal of our article.
    This paper proposes the main issues of information retrieval including thesystem structure of information filtering, the characteristic and classification offiltering systems, the relation between retrieving and filtering, the commonmodel of filtering system, the evaluation index of the system performance andso on. It dwells on the information retrieval questions as a whole and points outthe existent problems in current information retrieval systems.
    Aiming at the existed problems in current IR systems and based on theexistent IF technology, the filtering algorithm was improved and an individuatedfiltering system which has the intelligent characteristic, subjective ability andextendibility was designed and realized in this article. A composite filteringapproach was adopted in the new system. The main research issues of thisarticle are as follows: retrieving and filtering, a composite filtering algorithm,Relative Recall and Relative Precision.
    At first, a model of personalized search engines was presented. Workingprinciple of the model was given. According meta-search engine, Personalizedand intelligent service of search engines through application of informationfiltering technology in search engine was realized.
    The filtering algorithm is the core of an individuated filtering system. Theindividuation was realized through the filtering approach. A composite filteringalgorithm was designed in this article.
    High Recall in information retrieval on internet brings the gigantic results.People still need to look for their requisite information again in the results.When search engine provided personal information retrieval on web, Recall andPrecision are not suitable anymore. So Relative Recall and Relative Precisionwere described in this article.
引文
1 曹军.Google的PageRank技术剖析.情报杂志,2002,(10):15-18
    2 搜索引擎优化.http://www.csdn.net,2006
    3 何炎祥, 李盈橙, 叶磊.用多Agent技术实现个性化搜索.计算机应用,2003,(4):1-3
    4 赵志荣.个性化搜索引擎的研究、设计与实现.[四川大学硕士学位论文].2002:1-70
    5 S.Lawrence. Accessibility of Information on the Web. Nature, 1999, 400(7):7-109
    6 P.Maglio, et al. How to Build Modeling Agents to Support Web Searches. New York, Spring-verlag, 1997, (2):12-17
    7 J.Mostafa, S.W.Lam, M.Palakal. A Multilevel Approach to Intelligent Information Filtering: Model, System, and Evaluation. ACM Transactions on Information Systems, 1997, 15(4):368-399
    8 张俊伟, 张岭, 马范援.提供个性化服务的搜索引擎页面排序算法.计算机工程, 2003,(19):58-60
    9 冯永杰, 孟宾, 翟玉庆.Agent在智能信息检索中的应用研究.计算机应用研究, 2002,(2):35-37
    10 张元馨, 赵仲孟, 沈均毅.一种基于向量空间模型的个性化搜索引擎研究.微电子与计算机,2003,(11):52~55
    11 顾鑫.个性化智能信息检索系统研究. [哈尔滨工程大学硕士学位论文].2004:1-52
    12 李雪梅.基于语义的个性化Web搜索.情报杂志,2003,(3):27-29
    13 唐铭节.论搜索引擎的发展概况及发展趋势.情报杂志,2001,(5):70-71
    14 黄于蓝, 王洪, 徐端颐, 等.搜索引擎技术的新发展——多元搜索引擎系统.计算机工程,2002,(1):4-5
    15 张廷华.Web元搜索引擎的改进.计算机应用,2002,(2):105-107
    16 周涛.中文搜索引擎.图书馆理论与实践,2001,(3):52-53
    17 J.Mostafa, S.W.Lam, M.Palakal. A Multilevel Approach to Intelligent Information Filtering: Model, System, and Evaluation. ACM Transactions on Information Systems, 1997, 15(4):368-399
    18 李洁.搜索引擎中相关性测算发展研究.情报检索,2003,(12):62-64
    19 Quiroga, Javed Mostafa. An Experiment in Building Profiles in the Information Filtering: the Role of Context of User Relevance Feedback. Processing and Management, 2002, (38):671-694
    20 王娜 . 信息过滤技术在基于 Web 的个性化定制服务中的应用 . 情报杂志 , 2004,(4):21-23
    21 Foltz Peter W, Dumais Susan T. Personalized Information Delivery: An Analysis of Information Filtering Methods. Communications of ACM, 1992, 35(12):51-60
    22 George, Meghab, Abraham Kande. Stochastic Simulations of Web Search Engines: RBF Versus Second-order Regression Models. Information Sciences, 2004, (159):1-28
    23 P,Hanani, U.Stereotypes. Information filtering systems. Information Processing and Management, 1997, 33(3):273-287
    24 贺宏朝, 何丕廉, 陈霞.利用人工和自动生成的资源进行中文信息检索查询扩展.计算机工程与应用,2002,(21):18-20
    25 S.Lawrence, C.Lee. Context and Page Analysis for Improved Web Search. IEEE Internet Computing, 1998, (4):38-46
    26 B.Krulwich, C.Brukoy. The Info Finder Agent: Learning User Interests Through Heuristic Phrase Extraction. IEEE Expert, 1997, (5):22-27
    27 David Goldberg, David Nichols, Oki Brian M, et al. Using Collaborative Filtering to Weave An Information Tapestry. Communications of ACM, 1992, (12):61-70
    28 B.Mobasher, R.Cooley, J.Srivastava. Automatic Personalization Based on Web Usage Mining. Communications of the ACM, 2000, 43(8):142-151
    29 V.N.Gidivada. Information retrieval on the World Wide Web. IEEE Internet Computing, 1997, 1(5):58-68
    30 刘俊平, 李书振, 张志毅.智能搜索引擎实例分析.计算机应用研究,2003,(1):82-84
    31 张晓冬, 张书杰.关于信息过滤模型的探讨.计算机工程与应用, 2002,(5):99-101
    32 L.Charles, A.Clarkel. Shortest-substring Retrieval and ranking. ACM Transactions on Information Systems, 2000, 1(1):44-78
    33 J.Konstan, B.Miller. Maltz. GroupLens: Applying Collaborative Filtering to Usenet News. Communications of the ACM, 1997, 40(3):77-87
    34 B.Shapira, P.Shoval, U.Hanani. Experimentation with an Information Filtering system that combines cognitive and sociological filtering integrated with user stereotypes. Decision Support Systems, 1999, (27):5-24
    35 J.Belkin, Croft W Bruce. Information Filtering and Information Retrieval: Two Sides of the Same Coin. Communications of ACM, 1992, (12):29-38
    36 俞立文, 赵政.搜索引擎的工作机制.微型机与应用,2002,(9):31-33
    37 赵媛, 王远均.INTERNET信息检索的主要方法.情报杂志, 2000,(19):15-16
    38 杨小平, 丁浩, 黄都培.基于向量空间模型的中文信息检索技术研究.计算机工程与应用,2003,(15):109-111
    39 李勇.网络文本数据搜索引擎与搜索技术.情报理论与实践,2001,(4):298-300
    40 A.Arasu, J.Cho, G.M.Hector, et al. Searching the Web. ACM Transactions on Internet Technology, 2001,(1):2-43
    41 李凯, 赫枫龄, 左万利.PageRank-Pro一种改进的网页排序算法.吉林大学学报, 2003,(4):25-28
    42 J.M.Kleinberg. Authoritative Sources in a Hyperlinked Environment. In Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 1998:668-677
    43 赵荣, 黄燕云, 张露.搜索引擎检索结果的组织技术.情报学报,2004,(23):69-72
    44 毛颖, 周源远, 王继承, 等.信息过滤技术研究.计算机科学,2003,(8):10-12
    45 H.Sakagami, T.Kamba. Effective Personalization of Push Type Systems-visualizing Information Freshness. Computer Networks and ISDN Systems, 1998, 30(1-7):53-63
    46 夏祖奇, 黄水清, 赵展春.基于分类目录的元搜索引擎模型的提出与实现.情报学报,2003,(1):27-31
    47 Beg, Sufyan, Ahmad. Web Searching Personalized. Intelligent Engineering Systems Through Artificial Neural Networks, 2002, (12):1025-1032
    48 李永平, 文坤梅.集成搜索引擎中结果排序的优化分析.华中科技大学学报(自然科学版),2003,(11):28-30
    49 冯是聪, 单松巍, 龚笔宏, 等.“天网”目录导航服务研究.计算机研究与发 展,2004,(4):653-659
    50 贺宏朝, 何丕廉, 高剑峰, 等.一种基于上下文的中文信息检索查询扩展.中文信息学报,2003,(6):32-37
    51 王树梅.词间相关性在Web检索中的新应用.计算机工程与应用, 2002,(21):112-113
    52 Liu Fang, Meng Weiyi. Personalized Web Search by Mapping User Queries to Categories. Proceedings of the International Conference on Information and Knowledge Management, McLean, Virginia, USA, 2002:558-565

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700