摘要
为了弥补传统的信息检索方法在企业内实施时查准率较低的缺陷,解决监督学习中训练数据短缺的问题,本研究提出了基于企业知识域类别和语义关联的查询词扩展方法。该方法首先利用主题模型对企业文档库进行建模,其次结合专家意见构建企业知识分类及相应的带有权重的类别描述词集,最后利用语义相似度对查询进行分类,在知识域描述词集中选择查询扩展词对查询进行扩展。本研究利用一家电子产品制造公司的真实数据进行实验研究,实验结果表明,扩展后的查询更能准确反映用户的查询要求,有效地提升了企业信息检索的查准率。
Conventional information retrieval methods usually attain relatively low accuracy in obtaining inner enterprise information retrieval solutions. This is partially because of the limited amount of training data available. To overcome these difficulties, this study proposed a query expansion approach based on enterprise knowledge domain categories and semantic relevance. The proposed method first makes use of a topic model and the expertise of professionals to create enterprise knowledge domain categories with weighted description terms, then classifies queries using semantic similarity into knowledge domain categories and selects terms for expansion from category description terms. This research used an electronic manufacturing company as case for experimental study. The experiment s results proved that the query expansion method effectively improves the enterprise information retrieval accuracy.
引文
[1] Hawking D. Challenges in enterprise search[C]//Proceedings of the 15th Australasian Database Conference, Dunedin, New Zealand, 2004, 27:15-24.
[2] Carpineto C, Romano G. A survey of automatic query expansion in information retrieval[J]. ACM Computing Surveys, 2012, 44(1):1-50.
[3] Mukherjee R, Mao J. Enterprise search:Tough stuff[J]. Queue,2004, 2(2):36.
[4] Jansen B J, Booth D L, Spink A. Determining the informational,navigational, and transactional intent of Web queries[J]. Information Processing&Management, 2008, 44(3):1251-1266.
[5] Brin S, Page L. Reprint of:The anatomy of a large-scale hyper textual web search engine[J]. Computer networks, 2012, 56(18):3825-3833.
[6] Kleinberg J M. Authoritative sources in a hyperlinked environment[J]. Journal of the ACM, 1999, 46(5):604-632.
[7] Li Z J, Raskin V, Ramani K. Developing ontologies for engineering information retrieval[C]//Proceedings of the ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASME, 2007, 2:737-745.
[8] Efthimiadis E N, Biron P V. Ucla-Okapi at Trec-2:Query expansion experiments[C]//Proceedings of the Second Text Retrieval Conference, 1993:278-290.
[9] Maron M E, Kuhns J L. On relevance, probabilistic indexing and information retrieval[J]. Journal of the ACM, 1960, 7(3):216-244.
[10] Lesk M E. Word-word associations in document retrieval systems[J]. American Documentation, 1969, 20(1):27-38.
[11] Minker J, Wilson G A, Zimmerman B H. An evaluation of query expansion by the addition of clustered terms for a document retrieval system[J]. Information Storage and Retrieval. 1972, 8(6):329-348.
[12] Harper D J, van Rijsbergen C J. An evaluation of feedback in document retrieval using co-occurrence data[J]. Journal of Documentation, 1978, 34(3):189-216.
[13] Qiu Y G, Frei H P. Concept based query expansion[C]//Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 1993:160-169.
[14] Jing Y F, Croft W B. An association thesaurus for information retrieval[R]//An Association Thesaurus for Information Retrieval.Amhers:University of Massachusetts, 1994, 1:146-160.
[15] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6):391-407.
[16] Attar R, Fraenkel A S. Local feedback in full-text retrieval systems[J]. Journal of the ACM, 1977, 24(3):397-417.
[17] Rocchio J J. Relevance feedback in information retrieval[M]//Salton G. The Smart Retrieval System-Experiments in Automatic Document Processing. Prentice-Hall, 1971:313-323.
[18]张一洲.基于VSM和偏好本体的个性化信息检索技术的研究[J].情报学报, 2015, 34(7):711-716.
[19]李纲,毛进,芦昆.医学信息检索中一种基于概念的查询相关模型[J].情报学报, 2014, 33(3):239-249.
[20] Robertson S E, Jones K S. Relevance weighting of search terms[J]. Journal of the Association for Information Science and Technology, 1976, 27(3):129-146.
[21] Lafferty J, Zhai C X. Document language models, query models,and risk minimization for information retrieval[J]. ACM SIGIR Forum, 2017, 51(2):251-259.
[22] Lavrenko V, Croft W B. Relevance based language models[C]//Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press, 2001:120-127.
[23] Zhai C X, Lafferty J. Model-based feedback in the language modeling approach to information retrieval[C]//Proceedings of the Tenth International Conference on Information and Knowledge Management. New York:ACM Press, 2001:403-410.
[24] Xu J X, Croft W B. Query expansion using local and global document analysis[J]. ACM SIGIR Forum, 2017, 51(2):168-175.
[25] Fonseca B M, Golgher P B, De Moura E S, et al. Discovering search engine related queries using association rules[J]. Journal of Web Engineering, 2003, 2(4):215-227.
[26] Mart??n??-Bautista M J, Sánchez D, Chamorro-Mart??n??ez J, et al.Mining web documents to find additional query terms using fuzzy association rules[J]. Fuzzy Sets and Systems, 2004, 148(1):85-104.
[27]崔航,文继荣,李敏强.基于用户日志的查询扩展统计模型[J].软件学报, 2003, 14(9):1593-1599.
[28]黄名选,严小卫,张师超.查询扩展技术进展与展望[J].计算机应用与软件, 2007, 24(11):1-4.
[29]李洁,丁颖.语义网、语义网格和语义网络[J].计算机与现代化, 2007(7):38-41.
[30] Al-Hawamdeh S. Knowledge management:Cultivating knowledge professionals[M]. Oxford:Chandos Publishing, 2003:199-216.