中文数据库自然语言查询处理研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据库自然语言界面(NLIDB)是指允许用户用自然语言访问数据库的一种方式。它是多学科交叉的产物,涉及自然语言处理,数据库系统,人工智能,人机界面等多方面研究。三十多年来,数据库自然语言界面方面的研究取得了很大进步,但其系统还没有能够广泛地推广应用,其中还有许多技术问题需要进一步研究解决。
     本文尝试用一种基于数据库语义的一整套语言处理逻辑来解决NLIDB一些关键问题,在研究思路上作者侧重综合利用各学科的相关知识,以求克服原有的人工智能流派和数据库流派的研究方法的不足。
     本文首先给出了NLIDB的形式化定义和分类,用一个通用的抽象模型来界定NLIDB的研究范畴,从理论的高度对NLIDB中的关键问题进行了重新的理解和诠释,确立了中文数据库自然语言界面的两类系统研究,即基于模板的中文查询语言Chiql和基于受限汉语的中文自然语言查询系统NChiql。
     本文提出了中文数据库自然语言界面NChiql的系统体系结构,在设计上强调系统良好的可移植性、可用性、可适应性、鲁棒性和智能性。在知识的构成和表述上,提出了语言知识,领域知识和数据库知识融为一体的构想,建立了语义概念模型(SCM)。在知识提取方面,给出了静态和动态双重提取机制。在中文自然语言查询处理上,提出了基于数据库语义的分词方法,通过回溯机制、相关语义确定法、通用消歧规则,可以有效解决分词中的歧义切分、歧义词和未知词等问题。在对句法分析方法经过分析判断后,给出了适合中文自然语言查
Natural language interfaces to databases (NLIDBs) provide users with a way to access information stored in databases directly in natural language. NLIDBs involve many kinds of subjects, such as AI, NLP, DB, HCI, etc. Over the past thirty years, although there have been signification advances in the area, the NLIDB systems did not gain rapid and wide commercial acceptance for the problems of portability and usability.
    This thesis attempts to develop a new methodology based on the database semantics to solve the key problems in NLIDBs. I argue that previous approaches to NLIDB are problematic, mainly because they do not pay more attention to benefit from different subjects synthetically.
    This thesis first presents a formal definition and classification about NLIDB, and then gives a general abstract model involved in NLIDB to outline the research scope and highlight the key and tough points of NLIDB. Based on the above discussion, two kinds of Chinese natural language interfaces are depicted in our project, namely Chiql, a template-based system, and NChiql, a restricted natural language based system.
    This thesis provides the portable architecture of NChiql, which emphasizes on the portability, usability, adaptability, robust and intelligence. In order to achieve these goals, this thesis presents a semantic conceptual model (SCM) which attempts to integrate the knowledge of language, specific domain and database. Static and dynamic knowledge acquisition mechanism is adopted to construct SCM.
    Based on the domain concepts and database semantic in SCM, this thesis depicts a word segmentation algorithm, which can handle the lexical ambiguity and unknown words by applying backtracking, related semantic
引文
[1] Henry F.K., Silberschatz A., Database System Concepts, 3nd Edition, McGraw-Hill, 1997.
    [2] Date C.J., An Introduction to Database System, Vol.1.5th edition, Addison-Wesley, 1990.
    [3] Elmasri R. , Navathe B.N., Fundamentals of Database Systems, The Benjamin/Cummings Publshing Company, 1989.
    [4] Ullman J.D., Principles of Database and Knowledge-Base Systems, Computer Science Press, Volume Ⅰ, 1988, Volume Ⅱ, 1989.
    [5] Stonebraker M.(edo), Readings in Database Systems, Morgan-Kaufmann, 1994.
    [6] Kim W.(ed.), Modern Database Systems: The Object Model, Interoperability, and Beyond, ACM Press, 1994.
    [7] Kim W., et al, Query Processing in Database Systems, Springer-Verlag, 1985.
    [8] Ullman, J.D., Widom J., A First Course in Database Systems, Prentice Hall, 1997.
    [9] Date C.J., A Guide to the SQL Standard, Addison-Wesley, 1987.
    [10] 萨师煊,王珊,数据库系统概论(第二版),高教出版社,1991.
    [11] 周龙骧,数据库管理系统实现技术,中国地质大学出版社,1990.
    [12] 施伯乐,何继潮等,关系数据库的理论及应用,河南科学技术出版社,1989.
    [13] Cooper R., The Interaction. Between DBMS and User Interface Reseach, In Proceedings of the First International Workshop on Interfaces to Database Systems, Glasgow, pp1-5, July 1-3, 1992.
    [14] Copestake A., Jones K.S., Natural Language Interfaces to Databases, The Knowledge Engineering Review, Vol.5(4), pp225-2491990.
    [15] Androutsopoulos L., et al, Natural Language Interfaces to Database-An Introduction, URL: http://xxx.lanl.gov/abs/cmp-lg. Also in Journal of Natural Language Engineering, Vol.1(1), pp29-81, Combridge University Press, 1995.
    [16] Sap M.N., McGregor D. R., Natural Language Interfaces to Databases: State of the Art, Technique Report, University of Strathclyde.
    [17] Wallace M., Communicating With Databases In Natural Language, Ellis Horwood Limited, 1984
    [18] Woods W., Kaplan R., Nash-Webber B., The Lunnar Sciences Natural Language Information System: Final Report, BBN Report 2378, Cambridge,Massachusetts, 1972.
    [19] Thompson C. W., et al., Building Usable Menu-Based Natural Language Interface to Databases, Proc, of the 9 th Internatioal Conference on VLDB.
    [20] Hendrix G. G. , Lewis W. H., Transportable Natural Language Interface to Database, American Journal of Computational Linguistic, Vol.7,1981.
    [21 ] Waltz,DL, An English Language Question Answering System for a Large Relational Database, Communications of the ACM, Vol.21(7), pp529-539, 1978.
    [22] Grosz B.,et al., 1987, Team: An Experiment in the Design of Transportable Natural Language Interfaces, Artificial Intelligence, Vol.12, pp173-243, 1978.
    [23] Cha S. K., et al., Kaleidoscope Data Model for An English-like Query Language, In Proc, of the 17th Internatioal Conference on VLDBs.
    
    [24] Thompson B.H., Thompson F.B., Introducing ASK, A Simply Knowledgeable System, In Proc. of the 1st Conference on Aren;t Natural Language Processing, California, pp17-24,1983.
    
    [25] Davidson J, Kaplan S J , Natural Language Access to Databases: Interpreting Update Requests, American Journal of Computational Linguistics, Vol.9(2), pp57-68, April-June 1983.
    [26] Hendrix G G, Natural Language Interface American Journal of Computational Linguistics, Vol.8(2), pp56-61, April-June 1982.
    
    [27] Epstein S S, Transportable Natural Language Processing Through Simplicity - the PRE System, ACM Transaction on Office Information Systems, Vol3(2), pp107-120, April 1985.
    
    [28] Tennant H R, Ross K M, et al, Menu-Based Natural Language Understanding, In Proceedings of the 21~(st) Annual Meeting of ACL, Cambridge, pp151-158, Masschusetts, 1983.
    
    [29] Mosny M., Semantic Information Preprocessing for Natural Language Interfaces to Databases, In Proc. of the 33~(rd) Annual Meeting of the Association for Computational Linguistic, pp314-316, June 1995.
    
    [30] Adam N.R., Gangopadhyay A., A Form-Based Natural Language Front-End to a CIM Database, IEEE Transaction on Konwledge and Data Engineering, Vol.9(2), 1997
    [31] Pulman S.G., et al., CLARE: a Combined Language and Reasoning Engine. Technical Report CRC-042, SRI Imternationai, Cambridge, Mass., 1993
    [32] Cercone N.J., et al., The SystemX Natural Language Interface: Design, Implementation, and Evaluation, Technical Report CSS-IS TR93-03, Simon Fraser University, Bumaby, BC., 1993.
    [33] 吕光楣,陈清波,关系数据库汉语查询接口的设计与实现,中文信息学报,卷5(4),1992
    [34] 吴照林,高广峰,CDSA模型及其在关系数据库自然语言接口中的实现,中文信息学报,第6卷,第1期,1992
    [35] 顾国良,王能斌,数据库汉语查询接口的设计与实现,计算机学报,第13卷第12期,1990
    [36] 张亚南,徐洁磐,数据库NL界面上汉语查询的EAAD模型,计算机学报,第16卷第12期,1993
    [37] 曹礼德,姚天顺,关系数据库上泛关系查询与中文查询语言的接口,中文信息学报,1986,1
    [38] 孟小峰,中文数据库自然语言查询界面研究,中文数据库系统研讨会,香港中文大学,1997,11。
    [39] Jarke M., Vassiliou Y., A Framework for Choosing a Database Query Language, Computing Survey, Vol.17(3), pp313-340, September 1985.
    [40] Shneiderman B., Improving the human factors aspect of data base interactions, ACM Transactions on Database Systems, Vol. 3(4), pp417-439, 1978.
    [41] Dekleva S.M., Is Natural Language Querying Practical?, SIGBIT Data Base, Vol. 25(2), pp24-34, May, 1994.
    [42] Ogden W.C., Using Natural Language Language Interfaces, Handbook of Human Computer Interaction, Elsevier Science Publishers, pp281-299, 1988,
    [43] Guida G., Maurl G., A Formal Basis for Performance Evaluation of Natural Language Understanding Systems, Computational Linguistics, Vol.10(1), pp15-30, Jan-Mar 1984.
    [44] Small D.W., Weldon L.J., An Experimental Comparison of Natural and Structured Query Languages, Human Factors, Vol. 25, pp253-263, 1983.
    [45] Bell J.E., The Experiences of New Users of a Natural Language Interface to a Relational Database in a Controlled Setting, In Proceedings of the First International Workshop on Interfaces to Database Systems, Glasgow, pp432-454, July 1-3, 1992.
    [46] Helander M., Handbook of Human-Computer Interaction, North-Holland, Amsterdam, 1988.
    [47] Shneiderman, B., Designing the User Interface: Strategies for Effective Human-Computer Interaction, Addison-Wesley, Reading, Massachusetts, 1987.
    [48] Lum, V.Y., Advaneed Computerization in China by Integrating Cultural Aspects into Technology, In Proc. of the International workshop on Science Frontiers and Priority Setting of NSSFC. pp122-136, Beijing, August 3-6, 1994.
    [49] Lum V.Y., Wong K.F. and Lain G.C.K., Chiql-an Unconventional Chinese Database Query Language, In Proceedings of 1994 International Conference on Computer Processing of Oriental Languages, pp69-74, Yaejon, Korea, May 10-13, 1994.
    [50] Lam, C.K., Lum, V.Y., Wong, K.F, On the Issues of Expressiveness and Portability of Chiql, In Proceedings of the forth International Conference on Database Systems for Advanced Applications, pp 164-171, Singapore, April 10-13, 1995.
    [51] Meng X F, Experiences In Chinese Query Languages, In Proc. of Macao Information Congress'99, Macao, March 1999.
    [52] 孟小峰,王珊等,中文数据库查询语言,计算机世界技术专题第31期,98,9
    [53] 孟小峰,王珊等,中文数据库自然语言界面研究,计算机世界技术专题第31期,98,9
    [54] Meng X F, et al, A Chinese Query Language Chiql: Design and Evaluation, In Proc. of Software Engineering: Education & Practice Conference'98, IEEE Computer Society Press, New Zealand, 1998.
    [55] Meng X F, Wang S, et al, The Processing and Improvement of Multi-statement Queries in Chiql, Journal of Computer Science and Technology, Vol 13(2), 1998.
    [56] 孟小峰,王珊等,中文数据库查询语言Chiql及其用户界面的设计与实现,97’京港国际计算机会议论文集,1997,10。北京。
    [57] 孟小峰,王珊等,Chiql的多语句查询特征及其优化处理,软件学报,卷8(7),1997,7。
    [58] 黄锦辉,任永杰,孟小峰,Chiql在因特网上的应用,计算机科学专刊,10,1998。
    [59] Meng X F, et al., Restricted Natural Language Query Optimization and Translation in Chinese Database System, In Proc. of CODAS'99, Spinger-Verlag, Wollongong, Australia, March 1999.
    [60] 刘开瑛,郭炳炎,自然语言处理,科学出版社,1991
    [61] Rich E., Artificial Intelligence, McGraw-Hill, 1983.
    [62] Harris M. D., Introduction to Natural Language Processing, Prentice-Hall, 1985.
    [63] 石纯一,黄昌宁,王家廞,人工智能原理,清华大学出版社,1993
    [64] 冯志伟,自然语言的计算机处理,上海外语教育出版社,1996.
    [65] 俞士汶,朱学峰,计算语言学文集,北京大学计算语言学所,1996
    [66] 陈力为,袁琦,语言工程,清华大学出版社,1997.
    [67] 周斌武,张国梁,语言与现代逻辑,复旦大学出版社,1996.
    [68] 吴竟存,侯学超,现代汉语句法分析,北京大学出版社,1982.
    [69] 周锡令,关于自然语言理解的理解,语言文字应用,1997年第4期
    [70] 孙健,张尧等,汉语受限语言的设计与应用,中文信息学报,Vol.11(3),1997
    [71] 孟小峰等,中文自然语言查询调查分析与总结,中国人民大学数据与知识工程研究所内部资料,1998年12月
    [72] Cereone N., McCalla G., Accessing Knowledge through Natural Language, Advances in Computers, Vol.25, pp1-99, Academic Press, Inc. 1986.
    [73] Wu X, Ichikawa T, Cercone N, Knowledge-base Assisted Database Retrieval Systems, World Scientific, 1996.
    [74] Lehman J F, Adaptive Parsing-Self-Extending Natural Language Interfaces, Kluwer Academic Publishers, 1992.
    [75] 史忠植,知识工程,清华大学出版社,1988
    [76] 陈文伟,智能决策技术,电子工业出版社,1998
    [77] Gulla J.A., et al., Retrieving Conceptual Models on the Basis of Word Semantics, In Proc. of Application of Natural Language to Information Systems, IOS Press, pp115-125, 1996.
    [78] Bernauer J., et al., Structured Data Entry for Medical Records and Reports, In Proc. of Application of Natural Language to Information Systems, IOS Press, pp151-162, 1996.
    [79] Buchhoiz E., et al., Capturing Information on Behavior with the RADD-NLI: A Linguistic and Knowledge Based Approach, In Proc. of Application of Natural Language to Information Systems, IOS Press, pp 185-196, 1996.
    [80] Hoppenbrouwers J., et al., NL Structures and Conceptual Modelling: The KISS Case, In Proc. of Application of Natural Language to Information Systems, IOS Press, pp197-209, 1996.
    [81] M.Andersson, Extracting an Entity Relationship Schema from a Relational Database through Reverse Engineering, http://hypatia.dcs.qmw.ac.uk/SEL-HPC/Articles/DBAchive.html.
    [82] Meng X F, Zhou Yong, Wang Shan, Domain Knowledge Extracting in a Chinese Natural Language Interface to Database: NChiql, In Proc of PAKDD'99, Spinger-Verlag, Beijing, 4, 1999.
    [83] 黄昌宁,中文信息处理中的分词问题,语言文字应用,1997年第1期
    [84] 俞士汶,自然语言的歧义与机器翻译的对策,中文信息学报,Vol.3(2),1989.
    [85] Dinenberg F., Levin D., Natural Languge Interfaces for Environmental Data Bases, In Proc. of Application of Natural Language to Information Systems, IOS Press, pp175-184, 1996.
    [86] Sproat R., et al., A Stochastic Finite-State Word-Segmentation Algorithm for Chinese, URL: http://xxx.lanl.gov/abs/cmp-lg
    [87] Meng X F, Liu S., Wang S, Word Segmentation based on Database Semantic in NChiql, In Proc. of Int. Symposium on Machine Translation & Computer Language Information Processing, 6, 1999. (已录用)
    [88] Lai B.Y., Huang C., Dependency Grammar and the Parsing of Chinese, URL: http://xxx.lanl.gov/abs/cmp-lg
    [89] Shiuan P.L., Ann C., A Divide-and-Conquer Strategy for Parsing, In Proc. of the ACL/SIGPARSE 5th Int. Workshop on Parsing Technologies, pp57-66, Santa Cruz, USA, 1996.
    [90] 郭志立,基于语料库的知识获取与汉语句法分析,清华大学硕士学位论文,1996.
    [91] Rayner M., Abductive Equivalential Translation and its Application to Natural Language Database Interfacing, Phd. Thesis, SRI International, Cambridge, September 1993.
    [92] Semmel R. D., Mayfield J., Automated Query Formulation Using an Entity- Relationship Conceptual Schema, J. of Intelligent Information Systems, 8, pp267-290, 1997.
    [93] Kim W., On optimizing an SQL-like nested query, ACM Trans. Database Syst. 7, pp.443-469, 3 Sept. 1982.
    [94] 孟小峰,王珊,嵌套查询的非嵌套化处理研究,计算机学报,卷18(4),1995,4。
    [95] 周勇,孟小峰,王珊,基于语义依存树的自然语言查询到SQL的转换,兰州大学学报,第35卷专辑,第16届全国数据库学术会议,8月,1999
    [96] Wong K F, Meng X F, Wang S, How to Specify a Database Request without Much Knowledge of the Query Language-A case for Chiql, Chinese Journal of Advanced Software Research, Vol.5(4), 1998.
    [97] McKeown K R, Paraphrasing Questions Using Given and New Information, American Journal of Computational Linguistics, Vol.9(1), pp1-10, Jan-Mar 1983.
    [98] Kaplan S.J., Cooperative Respones from a Portable Natural Language Query System, Artificial Intelligence, Vol 19(2), pp165-187, 1982.
    [99] Janas J.M., How to Not Say "NIL"- Improving Answers to Failing Queries in Database Systems, In Proc. of the Sixth International Joint conference on Artificial Intelligence, Japan, 1979.
    [100] Kao M., Cercone N., Providing Quality Respones with Natural Language Interfaces: The NULL Value Problem, IEEE Transaction on Software Engineering, Vol. 14(7), July 1988.
    [101] Kalita J.K., Colbourn M.J., McCalla G.I., Summarizing Natural Language Database Responses, Computational Linguistic, Vol. 12(2), pp107-124, 1986.
    [102] Coulouris G., et al., Distributed Systems Concepts and Design (2nd edition), Addison-Wesley, 1996.
    [103] Meng X F., Wang S., Researches on the Chinese Restricted Natural Language Interface to databases, Proc. of the Fifth International Conference for Young Computer Scientists, ICYCS'99, Nanjing, August, 1999. (Accepted)
    [104] 孟小峰,王珊,中文数据库自然语言查询系统NChiql设计与实现,兰州大学学报,第35卷专辑,第16届全国数据库学术会议,8月,1999(已录用)
    [105] 古明哲,孟小峰,中文数据库自然语言界面NChiql的Java实现。兰州大学学报,第35卷专辑,第16届全国数据库学术会议,8月,1999(已录用)
    [106] 王秋月,数据库自然语言界面系统的研究与设计,中国人民大学硕士学位论文,1998
    [107] Raskin J., Looking for a Humane Interface: Will Computers Ever Become Easy to Use?, Communications of ACM, Vol.40(2), 1997
    [108] Edward A. Feigenbamm, Turing Award Lectue: How the 'What' becomes the 'how', Communications of ACM, Vol. 39(5), 1996
    [109] Reddy R., Turing Award Lecture: To Dream the Possible Dream, Communications of ACM, Vol.39(5), 1996
    [110] 王珊,林耀燊,中文数据库系统研究与展望,计算机世界技术专题第31期,98,9
    [111] 刘莎,“全息”翻译—机器翻译的新航线,计算机世界报,产品与技术专家视点,1999年3月15日.
    [112] 张效祥主编,计算机科学技术百科全书,清华大学出版社,1998

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700