潜在语义索引在飞机故障案例检索的中应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
航空公司希望能根据飞机的故障描述检索故障案例数据库,即给出与输入的故障描述类似的故障案例来指导飞机的故障隔离。
     本文在介绍了潜在语义索引模型基本原理的基础上,详细地讨论了如何将该模型应用到飞机的故障案例检索系统中。在对故障案例进行预处理时,本文针对信息检索的特点提出了基于n-元的统计标引法对故障案例进行自动标引。本文以系统的检索性能为依据通过大量的实验确定了潜在语义索引模型所使用的权值方案、K值、相似度阈值等参数。
     根据实验的结果,本文设计开发了故障案例检索系统,并对系统进行了优化。
Latent semantic indexing (LSI) is a completely automatic yet intelligent indexing method,widely applicable,and a promising way to improve user' s access of many kinds of textual materials. LSI tries to overcome the problem of lexical matching by using statistically derived conceptual indices instead of individual words for retrieval.
    In this thesis LSI is introduced into case retrieval,and the application of LSI represented also. In order to preprocess the text of aircraft fault,a new automatic indexing method based on n-gram is proposed. Large quantities of experiments to adjust the parameters of LSI model are done so that the performance of case retrieval system is excellent.
    An LSI-based case retrieval system is developed and some work is done to optimize the system.
引文
[1] Tamara Gibson Kolda. Limited-Memory Matrix Method with Applications, Doctor's dissertation, University of Maryland, College Park.
    [2] Michael W. Berry. Large-scale sparse singular value computations. The international Journal of Supercomputer Applications, 6(1): 13-49,1992.
    [3] M.W. Berry, Susan T. Dumais, and Gavin W. O'8rien. Using linear algebra for intelligent information retrieval. SIAM Review, 37:573-595,1995.
    [4] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391-407,1990.
    [5] Oolub, C. V. Loan. Matrix Computations, Johns-Hopkins, Baltimore, second ed., 1989.
    [6] Golub, C. Reinsch. Handbook for automatic computation Ⅱ, linear algegra, Springer-Verlag, New York, 1971
    [7] Susan T. Dumais. Latent semantic indexing (LSI): TREC-3 report. Pages 219-230, 1995.
    E8] C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979. Second Edition.
    [9] Gerard Salton. The state of retrieval systems evaluation. Information Processing & Management, 28(4): 441-449, 1992.
    [10] David Hull. Using statistical testing in the evaluation of retrieval experiments. In Proc. of SIGIR' 93, pages 329-338. The Association for Computing Machinery, 1993.
    [11] Isabelle Moulinier. A framework for comparing text categorization approaches. AAAI Spring Symposium on Machine Learning in Information Access. Stanford University, March 1996.
    [12] G. W. O' Brien. Information management tools for updating an SVD-encoded indexing scheme. Master's thesis. The University of Knoxville, Tennessee, Knoxville, TN, 1994.
    [13] Stefano Mizzaro. Relevance: The whole history. Journal of the American Society for Information Science, 48(9): 810-832, 1997.
    
    
    [14] ED Greengrass.Information Retrieval: A Survey. Nov.2000, http://www.cs.umbc.edu/cadip/pubs.html
    [15] Atsushi Fujii, Tetsuya Ishikawa. Anovelty-based evaluation method for information retrieval. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000), pp. 1637-1641, Jun. 2000.
    [16] HinrichSchǖtze, Craig Silverstein. Projections for efficient document clustering. In Proceedings of the 20th International ACM SIGIR Conference, 1997.
    [17] David D. Lewis. Evaluating and optimizing autonomous text classification systems. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 1995.
    [18] David D. Lewis. Evaluating Text Categorization. In Proceedings of the Speech and Natural Language Workshop, 312-318, Morgan Kaufmann, 1991.
    [19] Ian M. Soboroff, Charles K. Nicholas, James M. Kukla, and David S. Ebert. Visualizing document authorship using n-grams and latent semantic indexing. In Proceedings of the Workshop on New Paradigms in Information Visualization and Manipulation (NPIV'97), Las Vegas, NV, USA, November 1997. ACM Press.
    [20] Ian M. Soboroff. Collaborative Filtering with LSI: Experiments with Cranfield. Technical Report TR-CS-98-O1. Dept. of Computer Science and Electrical Engineering, University of Maryland, Baltimore County. November 10, 1998.
    [21] Yiming Yang. An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval Journal, 1:69-90,1999.
    [22] J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel. Performance measures for information extraction, in Proc. of DARPA Broadcast NewsWorkshop, Herndon, VA, Feb. 1999.
    [23] MD-90 Trouble Shooting Manual. Boeing Corp.
    [24] Geoffrey Z.Liu.语义矢量空间模式(SVSM)及其实验评价——自然语言处理与文献字段标引,情报学报,1996,15(6)
    [25] 陈华辉,王让定.一种语言文本自动检索方法.宁波大学学报(理工版),2000,
    
    13(2)
    [26] 陈华辉.一种基于潜在语义的“垃圾”邮件过滤方法.计算机应用研究,2000,No.10
    [27] 窦竹梅,何新贵,彭甫阳.基于知识的文本检索,系统工程与电子技术,1995,No.2
    [28] 冯项云.LSI潜在语义标引在情报检索中的应用,现代图书情报技术,1998,No.4
    [29] 贡大跃.基于查询扩充机制的中文文本检索,辽宁师专学报,2000,2(1)
    [30] 顾敏,史丽萍,李春玲.自动标引综述,黑龙江水专学报,2000,27(3)
    [31] 李广原,陈丹.文本信息检索技术,广西科学院学报,2001,17(2)
    [32] 李龙澍,张霞.基于Rough集的情报检索研究,情报学报,2002,21(1)
    [33] 李培.汉语自动标引方法述评,津图学刊,2000,No.1
    [34] 刘挺,吴岩,王开铸.串频统计和词形匹配相结合的汉语自动分词系统,中文信息学报,1998,12(1)
    [35] 林鸿飞,姚天顺.基于潜在,语义索引的文本浏览机制.中文信息学报,2000,14(5)
    [36] 林鸿飞,李业丽,姚天顺.中英文双语交叉过滤的逻辑模型.计算机工程与应用,2000,8
    [37] 林鸿飞,高仁璟.基于潜在语义索引的文本摘要方法.大连理工大学学报,2001,41(6)
    [38] 林鸿飞,姚天顺.基于示例的中文文本过滤模型.大连理工大学学报,2000,40(3)
    [39] 林鸿飞,高天,姚天顺.中文文本的可视化表示.东北大学学报(自然科学版),2000,20(5)
    [40] 鲁松,李晓黎,白硕,王实.文本中词语权重计算方法的改进,中文信息学报,2000,14(6)
    [41] 马志锋,邢汉承,郑晓妹.基于相似Rough集的模糊检索策略.计算机工程与应用,2000,2
    [42] 牛伟霞,张永奎.潜在语义索引方法在信息过滤中的应用.计算机工程与应用,2001,9
    [43] 王永成,王刚,杨立平.案例检索的若干问题.情报学报,2000,19(6)
    [44] 邵艳秋,刘挺,王开铸.中文科技文献题内自动抽词标引系统,电脑学习,1998,
    
    April,No.2
    [45] 孙宾.现代汉语文本的词语切分技术,北京大学计算语言学研究所,http://icl.pku.edu.cn/bswen/nlp/reportl-sementation.html
    [46] 孙宾.适用于信息检索和提取的汉语词典,北京大学计算语言学研究所,http://iCl.pku.edu.cn/bswen/nlp-rhs.html
    [47] 陶影.模糊查询和模糊数据在数据库中的应用.黄金学报,2001,3(3)
    [48] 陶跃华,王锡钢,王云爱.信息检索向量空间模型中特征提取的研究,云南师范大学学报,2000,20(6)
    [49] 陶跃华,孙茂松.搜索引擎中相关性反馈技术,情报理论与实践,2001,24(4)
    [50] 陶跃华.基于向量的相似度计算方案,云南师范大学学报,2001,2l(5)
    [51] 赵云志.统计分析法自动标引的改进,情报学报,2000,19(4)
    [52] 周泓,徐小良,汪乐宇.基于模糊算法的数据库查询工具的设计,计算机应用研究,2001,5
    [53] 周水庚,关佶红,胡运发.隐含语义索引及其在中文文本处理中的应用研究,小型微型计算机系统,2001,22(2)
    [54] 萨师煊,王珊.数据库系统概论.北京:高等教育出版社,1991年第2版
    [55] 王知津.现代索引文摘法.北京:北京图书馆出版社,1999年5月第1版
    [56] 赖茂生等编著.计算机情报检索.北京:北京大学出版社,1993,3
    [57] 康耀红.现代情报检索理论.北京:科学技术文献出版社,1990,3
    [58] 张志涌.精通MatLab5.3版.北京:北京航空航天大学出版社,2000.
    [59] 张志涌,徐彦琴.MatLab教程——基于6.x版本.北京:北京航空航天大学出版社,2001