对结构化和半结构化数据的关键字搜索研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

About the library

Background
History
Leadership
Organization

Readers' Guide

Opening Hours
Collections
Help Via Email

Publications

Electronic Information Resources

对结构化和半结构化数据的关键字搜索研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Keyword Search in Structured and Semi-structured Data
作者：许建军
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：关键字搜索 ; 结构化数据 ; 半结构化数据 ; XML
英文关键词：keyword search ; structured data ; semi-structured data ; XML
学位年度：2007
导师：施伯乐
学科代码：081202
学位授予单位：复旦大学
论文提交日期：2007-04-10

摘要

关键字搜索是现今最为流行的信息发现方法，因为用户不需要学习任何复杂的查询语言，也不需要了解底层数据的结构，他只需要使用若干关键字来表达自己的信息需求即可。在过去的十几年中，对非结构化数据的关键字搜索已经有过较多的研究，随着结构化数据(以关系数据为典型代表)和半结构化数据(以XML数据为典型代表)数量的日益增多，人们转而把目光投向对这两类数据的关键字搜索研究。本文在充分吸取前人研究成果的基础上，以关键字搜索的效率和有效性为侧重点，针对现有工作存在的问题进行了较为深入的研究，提出了创新性的解决方法，主要取得了以下研究成果：
     1．对关系数据的关键字搜索，目前最流行的方法是基于搜索时连接的搜索方法，本文研究了其核心问题——模式图上连接表达式的搜索算法，提出了一种时间复杂度为多项式级延迟的搜索算法，并给出了它的正确性证明和时间复杂度分析。
     2．本文提出了一种基于预连接的对关系数据的关键字搜索方法。本文分析了在关系数据库中引入关键字搜索之后可能引发的若干问题，提出将搜索结果定义为包含所有查询关键字的完全元组图(CTG)，在此基础上设计了基于归并排序的高效的搜索算法，并给出了对搜索结果集的相关性排序方法。最后，对索引更新问题也给出了具体的解决方法。
     3．本文提出了一种基于MIU的对XML数据的关键字搜索方法。本文分析了在XML关键字搜索中结果粒度精细化可能引发的若干问题，定义了最小信息单元(MIU)的概念，给出了对任意XML文档划分最小信息单元的方法，并提出以最小信息单元作为索引、搜索的最小粒度，设计了精简的索引结构和相应的搜索算法。
     对于上述这些研究成果，本文给出了相应的实验数据，实验结果表明这些方法在关键字搜索的效率和有效性方面均有不同程度的提升，在科研领域和商业应用中都有着很好的应用前景。
Keyword search is now the most popular information discovery method because the user does not need to learn any query language, or know the underlying structure of the data. He only needs to input several keywords to express his information need. In the past decades, keyword search in the unstructured data has been well studied. With the increase of the amount of structured data (typically relational data) and semi-structured data (typically XML data), recently keyword search in the two kinds of data has attracted much attention. Based on the exsiting works, this dissertation lays emphasis on effectiveness and efficiency of keyword search in structured and semi-structured data. The main contributions of this dissertation are summarized as follows:
     1. The popular method of keyword search in relational data is based on search-time-join. This dissertation studies its core problem—search algorithm for the join expressions on the schema graph. It proposes a new search algorithm with the time complexity of polynomial delay, and gives its proof of correctness and analysis of time complexity.
     2. This dissertation proposes a new method of keyword search in relational data based on pre-join. Based on the analysis of the problems caused by physical scattering of different information parts, it gives the definition of Complete Tuple Graph (CTG) and regard CTG as the granularity of indexing and searching. Based on CTG, it designs the efficient search algorithm. It also put forwards the method of index maintenance.
     3. This dissertation proposes a new method of keyword search in XML data based on MIU. Based on the analysis of the problems caused by refinement of result granularity, it gives the definition of Minimal Information Unit (MIU) and presents the algorithm of partitioning the XML document into MIUs. Regarding MIU as the granularity of indexing and searching, it designs efficient index structures and the corresponding search algorithms.
     As for the above contributions, this dissertation gives the corresponding experimental data. The experimental results demonstrate the benefits of our method over previously proposed methods in terms of effectiveness and efficiency. These new methods not only have the promising future in scientific research fields, but also can be applied to the practical business applications.

引文

[1] Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das. DBXplorer: A System for Keyword-Based Search over Relational Databases. In Proc. of ICDE, 2002.
    [2] Vagelis Hristidis and Yannis Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. In Proc. of VLDB, 2002.
     [3] Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. In Proc. of ICDE, 2002.
    [4] Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. ObjectRank: Authority-Based Keyword Search in Databases. In Proc. of VLDB, 2004.
    [5] Richard Wheeldon, Mark Levene, and Kevin Keenoy. DbSurfer: A Search and Navigation Tool for Relational Databases. In Proc. of BNCOD, 2004.
    [6] Arvind Hulgeri, Gaurav Bhalotia, Charuta Nakhe, Soumen Chakraborty, and S. Sudarshan. Keyword Search in Databases. IEEE Data Engineering Bulletin, 2001, Vol 24, No 3: 22-32.
    [7] Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In Proc. of VLDB, 2003.
    [8] Jijun Wen and Shan Wang. SEEKER: Keyword-based Information Retrieval Over Relational Databases. Journal of Software, 2005, Vol 16, No 7: 1270-1281.
    [9] Qi Su and Jennifer Widom. Indexing Relational Database Content Offline for Efficient Keyword-Based Search. In Proc. of IDEAS, 2006.
    [10] B. Aditya, Soumen Chakrabarti, Rushi Desai, Arvind Hulgeri, Hrishikesh Karambelkar, Rupesh Nasre, Parag, and S. Sudarshan. User Interaction in the BANKS System. In Proc. of ICDE, 2003.
    [11] Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, and Aristides Gionis. Automated Ranking of Database Query Results. In Proc. of CIDR, 2003.
    [12] Roy Goldman, Narayanan Shivakumar, Suresh Venkatasubramanian, and Hector Garcia-Molina. Proximity Search in Databases. In Proc. of VLDB, 1998.
    [13] Ute Masermann and Gottfried Vossen. SISQL: Schema-Independent Database Querying(on and off the web). In Proc. of IDEAS, 2000.
    [14] Prabhakar Raghavan. Structured and Unstructured Search in Enterprises. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.
    [15] Norbert Fuhr. Logical and Conceptual Models for the Integration of Information Retrieval and Database Systems.
    [16] Lawrence V. Saxton and Vijay V. Raghavan. Design of an Integrated Information Retrieval/Database Management System. IEEE Transactions on Knowledge and Data Engineering, 1990, Vol 2, No 2.
    [17] Eric W. Brown, James P. Callan, W. Bruce Croft, and J. Eliot B. Moss. Supporting Full-Text Information Retrieval with a Persistent Object Store. In Proc. of EDBT, 1994.
     [18] B. Aditya, Gaurav Bhalotia, Soumen Chakrabarti, Arvind Hulgeri, Charuta Nakhe, Parag, and S. Sudarshan. BANKS: Browsing and Keyword Searching in Relational Databases. In Proc. of VLDB, 2002.
    [19] Richard Wheeldon, Mark Levene, and Kevin Keenoy. Search and Navigation in Relational Databases.
    [20] Vagelis Hristidis, Nick Koudas, and Yannis Papakonstantinou. PREFER: A System for the Efficient Execution of Multiparametric Ranked Queries. In Proc. of SIGMOD, 2001.
    [21] Kai-Uwe Sattler, Ingolf Geist, and Eike Schallehn. Concept-based Querying in Mediator Systems.
    [22] Ute Masermann and Gottfried Vossen. Design and Implementation of a Novel Approach to Keyword Searching in Relational Databases. In Proc. of DASFAA, 2000.
    [23] Shaul Dar, Gadi Entin, Shai Geva, and Eran Palmon. DTL's DataSpot: Database Exploration Using Plain Language. In Proc. of VLDB, 1998.

    [24] Ingolf Geist. Index-based Keyword Search in Mediator Systems. In Proc. of EDBT, 2004.
    [25] N.L. Sarda and Ankur Jain. Mragyati: A System for Keyword-based Searching in Databases.
    [26] Lin Guo, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram. XRANK: Ranked Keyword Search over XML Documents. In Proc. of SIGMOD, 2003.
    [27] Vagelis Hristidis, Yannis Papakonstantinou, and Audrey Balmin. Keyword Proximity Search on XML Graphs. In Proc. of ICDE, 2003.
    [28] Sara Cohen, Jonathan Mamou, Yaron Kanza, and Yehoshua Sagiv. XSEarch: A Semantic Search Engine for XML. In Proc. of VLDB, 2003.
    [29] Yu Xu and Yannis Papakonstantinou. Efficient Keyword Search for Smallest LCAs in XML Databases. In Proc. of SIGMOD, 2005.
    [30] Daniela Florescu, Donald Kossmann, and Ioana Manolescu. Integrating Keyword Search into XML Query Processing. In Proc. of IJCTN, 2000.
    [31] Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, and Ting Chen. From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching. In Proc. of VLDB, 2005.
    [32] Norbert Fuhr and Kai Groβjohann. XIRQL: A Query Language for Information Retrieval in XML Documents. In Proc. of SIGIR, 2001.

    [33] Anja Theobald and Gerhard Weikum. The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking. In Proc. of ICEDT, 2002.
    [34] Shurug Al-Khalifa, Cong Yu, and H. V. Jagadish. Querying Structured Text in an XML Database. In Proc. of SIGMOD, 2003.
    [35] Nilesh Dalvi and Dan Suciu. Indexing Heterogeneous Data. Technical Report, University of Washington, 2004.
    [36] Ingolf Geist, Torsten Declercq, Kai-Uwe Sattler, and Eike Schallehn. Query Reformulation for Keyword Searching in Mediator Systems. Technical Report, University of Magdeburg, 2003.
    [37] Pavel Calado, Altigran S. da Silva, Rodrigo C. Vieira, Alberto H. F. Laender, and Berthier A. Ribeiro-Neto. Searching Web Databases by Structuring Keyword-Based Queries. In Proc. of CIKM, 2002.
    [38] Baoshi Yan and Robert MacGregor. Translating Naive User Queries on the Semantic Web.

    [39] Jian Qiu, Feng Shao, Misha Zatsman, and Jayavel Shanmugasundaram. Index Structures for Querying the Deep Web. In Proc. of WebDB, 2003.
    [40] Chavdar Botev, Sihem Amer-Yahia, and Jayavel Shanmugasundaram. On the Completeness of Full-Text Search Languages for XML.
    [41] Sihem Amer-Yahia, Chavdar Botev, and Jayavel Shanmugasundaram. TeXQuery: A Full-Text Search Extension to XQuery. In Proc. of WWW, 2004.
    [42] Sara Cohen, Yaron Kanza, and Yehoshua Sagiv. Generating Relations from XML Documents. In Proc. of ICDT, 2003.
    [43] Torsten Grabs and Hans-Jorg Schek. PowerDB-XML: a Platform for Data-Centric and Document-Centric XML Processing.
    [44] Stephen Alstrup, Cyril Gavoille, Haim Kaplan, and Theis Rauhe. Nearest Common Ancestors: A survey and a new distributed algorithm.
    [45] Richard Cole and Ramesh Hariharan. Dynamic LCA Queries on Trees.
    [46] Arvind Arasu and Hector Garcia-Molina. Extracting Structured Data from Web Pages. In Proc. of SIGMOD, 2003.
    [47] Andrew Nierman and H. V. Jagadish. Evaluating Structural Similarity in XML Documents.
    [48] Surajit Chaudhuri, Raghav Kaushik, and Jeffrey F. Naughton. On Relational Support for XML Publishing: Beyond Sorting and Tagging. In Proc. of SIGMOD, 2003.
    [49] Ashraf Aboulnaga, Jeffrey F. Naughton, and Chun Zhang. Generating Synthetic Complex-structured XML Data.
    [50] Byron Choi. What Are Real DTDs Like.
    [51] David DeHaan, David Toman, Mariano P. Consens, and M. Tamer Ozsu. A Comprehensive XQuery to SQL Translation using Dynamic Interval Encoding. In Proc. of SIGMOD, 2003.
    [52] Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno. Algorithmics and Applications of Tree and Graph Searching. In Proc. of PODS, 2002.
    [53] Vincent Aguilera, Sophie Cluet, and Fanny Wattez. Xyleme Query Architecture. In Proc. of WWW, 2001.
    [54] Guy Jacobson, Balachander Krishnamurthy, Divesh Srivastava, and Dan Suciu. Focusing Search in Hierarchical Structures with Directory Sets. In Proc. of CIKM, 1998.
    [55] Vo Ngoc Anh, Owen de Kretser, and Alistair Moffat. Vector-Space Ranking with Effective Early Termination. In Proc. of SIGIR, 2001.
     [56] Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proc. of WWW, 1998.
    [57] Klemens Bohm, Karl Aberer, Erich J. Neuhold, and Xiaoya Yang. Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in HyperStorM. VLDB Journal, 1997, Vol 6, No 4.
    [58] Eric W. Brown, James P. Callan, and W. Bruce Croft. Fast Incremental Indexing for Full-Text Information Retrieval. In Proc. of VLDB, 1994.
    [59] V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From Structured Documents to Novel Query Facilities. In Proc. of SIGMOD, 1994.
    [60] Artur Czumaj, Miroslaw Kowaluk, and Andrzej Lingas. Faster Algorithms for Finding Lowest Common Ancestors in Directed Acyclic Graphs. Electronic Colloquium on Computational Complexity, Report No. 111 (2006).
    [61] Robert Luk, Alvin Chan, Tharam Dillon, and H.V. Leong. A Survey of Search Engines for XML Documents. SIGIR Workshop on XML and IR, 2000.
    [62] Torsten Schlieder and Felix Naumann. Approximate Tree Embedding for Querying XML Data. SIGIR Workshop on XML and IR, 2000.
    [63] David S. Johnson and Christos H. Papadimitriou. On Generating All Maximal Independent Sets. Information Processing Letters, 1988, Vol 27, No 3: 119-123.
    [64] Michael Persin. Document Filtering for Fast Ranking. In Proc. of SIGIR, 1994.
    [65] Albrecht Schmidt, Martin Kersten, and Menzo Windhouwer. Querying XML Documents Made Easy: Nearest Concept Queries. In Proc. of ICDE, 2001.
    [66] Igor Tatarinov, Stratis D. Viglas, Kevin Beyer, Jayavel Shanmugasundaram, Eugene Shekita, and Chun Zhang. Storing and Querying Ordered XML Using a Relational Database System. In Proc. of SIGMOD, 2002.
    [67] Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental Updates of Inverted Lists for Text Document Retrieval. In Proc. of SIGMOD, 1994.
    [68] Zhiyuan Chen, H. V. Jagadish, Flip Korn, Nick Koudas, S. Muthukrishnan, Raymond Ng, and Divesh Srivastava. Counting Twig Matches in a Tree. In Proc. of ICDE, 2001.
    [69] Haifeng Jiang, Wei Wang, Hongjun Lu, and Jeffrey Xu Yu. Holistic Twig Joins on Indexed XML Documents. In Proc. of VLDB, 2003.
    [70] Nicolas Bruno, Nick Koudas, and Divesh Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. In Proc. of SIGMOD, 2002.
    [71] Jayavel Shanmugasundaram, Kristin Tufte, Gang He, Chun Zhang, David DeWitt, and Jeffrey Naughton. Relational Databases for Querying XML Documents: Limitations and Opportunities. In Proc. of VLDB, 1999.
    [72] Michael Carey, Daniela Florescu, Zachary Ives, Ying Lu, Jayavel Shanmugasundaram, Eugene Shekita and Subbu Subramanian. XPERANTO: Publishing Object-Relational Data as XML. In Proc. of WebDB, 2000.
    [73] Jayavel Shanmugasundaram, Jerry Kiernan, Eugene Shekita, Catalina Fan, and John Funderburk. Querying XML Views of Relational Data. In Proc. of VLDB, 2001.
    [74] Zhiyuan Chen, Johannes Gehrke, Flip Korn, Nick Koudas, Jayavel Shanmugasundaram, and Divesh Srivastava. Index Structures for Matching XML Twigs Using Relational Query Processors. In Proc. of XSDM, 2005.
    [75] Sara Cohen, Yaron Kanza, and Benny Kimelfeld. Interconnection Semantics for Keyword Search in XML. In Proc. of CIKM, 2005.
    [76] Sara Cohen, Yaron Kanza, Yakov Kogan, Werner Nutt, Yehoshua Sagiv, and Alexander Serebrenik. EquiX-A Search and Query Language for XML. Journal of the American Society for Information Science and Technology, 2002, Vol 53, No 6: 454-466.
    [77] Sara Cohen, Yaron Kanza, and Yehoshua Sagiv. SQL4X: A Flexible Query Language for XML and Relational Databases. In Proc. of DBPL, 2001.
    [78] Sara Cohen and Yehoshua Sagiv. an Incremental Algorithm for Computing Ranked Full Disjunctions. In Proc. of PODS, 2005.
    [79] Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram Raghavan. Searching the web. Transactions on Internet Technology, 2001.
    [80] Jason T. L. Wang, Xiong Wang, Dennis Shasha, Bruce A. Shapiro, Kaizhong Zhang, Qicheng Ma, and Zasha Weinberg. an Approximate Search Engine for Structural Databases. In Proc. of SIGMOD, 2000.
    [81] M. H. Graham. On the Universal Relation. Technical Report, University of Toronto, 1979.
    [82] Justin Zobel, Alistair Moffat, and Kotagiri Ramamohanarao. Inverted Files Versus Signature Files for Text Indexing. In Proc. of TODS, 1998.
    [83] G J. Minty. A Simple Algorithm for Listing All the Trees of a Graph. IEEE Transactions on Circuits and Systems, 1965, Vol 12, No 1: 120-120.
    [84] Richard Wheeldon and Mark Levene. The Best Trail Algorithm for Adaptive Navigation in the World-Wide-Web. In Proc. of LAWC, 2003.
    [85] Richard Wheeldon. A Web of Trails. PhD thesis, Birkbeck University of London, 2003.
    [86] Knut Magne Risvik and Rolf Michelsen. Search Engines and Web Dynamics. Computer Networks, 2002, Vol 39, No 3: 289-302.
    [87] Keith H. Randall, Raymie Stata, Rajiv Wickremesinghe, and Janet L. Wiener. The Link Database: Fast Access to Graphs of the Web. In Proc. of DCC, 2003.
    [88] Steve Lawrence, Kurt Bollacker, and C. Lee Giles. Indexing and Retrieval of Scientific Literature. In Proc. of CIKM, 1999.
    [89] Steve Lawrence and C. Lee Giles. Searching the World Wide Web. Science, 1998, Vol 280, No 5360: 98-100.
    [90] Michael K. Bergman. The deep web: Surfacing hidden value. White paper, Bright Planet, 2000.
    [91] Yaron Kanza and Yehoshua Sagiv. Flexible Queries over Semistructured Data. In Proc. of PODS, 2001.
    [92] Sihem Amer-Yahia, SungRan Cho, and Divesh Srivastava. Tree Pattern Relaxation. In Proc. of EDBT, 2002.
    [93] Yunyao Li, Cong Yu, and H. V. Jagadish. Schema-Free XQuery. In Proc. of VLDB, 2004.
    [94] Zhaofang Wen. New algorithms for the LCA Problem and the Binary Tree Reconstruction Problem. Information Processing Letters, 1994, Vol 51, No 1: 11-16.
    [95] Baruch Schieber and Uzi Vishkin. On Finding Lowest Common Ancestors: Simplification and Parallelization. SIAM Journal on Computing, 1988, Vol 17 No 6: 1253-1262.
    [96] Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web: From Relations to Semistructured Data and Xml. Morgan Kaufmann Series in Data Management Systems, 2000.
    [97] Roy Goldman and Jennifer Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proc. of VLDB, 1997.
    [98] Michael Barg and Raymond K. Wong. Structural Proximity Searching for Large Collections of SemiStructured Data. In Proc. of CIKM, 2001.
    [99] Hans L. Bodlaender. A Linear Time Algorithm for Finding Tree-decompositions of Small Treewidth. SIAM Journal on Computing, 1996, Vol 25 No 6: 1305-1317.
    [100] Jason T. L. Wang, Xiong Wang, Dennis Shasha, Bruce A. Shapiro, Kaizhong Zhang, Qicheng Ma, and Zasha Weinberg. An Approximate Search Engine for Structural Databases. In Proc. of SIGMOD, 2000.
    [101] Nicolas Bruno, Luis Gravano, and Amelie Marian. Evaluating Top-k Queries overWeb-Accessible Databases. In Proc. of ICDE, 2002.
    [102] Michael Ortega, Yong Rui, Kaushik Chakrabarti, Kriengkrai Porkaew, Thomas S. Huang, and Sharad Mehrotra. Supporting Ranked Boolean Similarity Queries in MARS.
    [103] Amit Singhal. Modern Information Retrieval: A Brief Overview. IEEE Data Engineering Bulletin, Special Issue on Text and Databases, 2001, Vol 24 No 4.
    [104] Takeaki UNO. An Algorithm for Enumerating all Directed Spanning Trees in a Directed Graph.
    [105] Sanjiv Kapoor and H. Ramesh. Algorithms for Enumerating All Spanning Trees of Undirected and Weighted Graphs. S1AM J. Computing, 1995.
    [106] Shu-Yao Chien, Zografoula Vagena, Donghui Zhang, Vassilis J. Tsotras, and Carlo Zaniolo. Effcient Structural Joins on Indexed XML Documents. In Proc. of VLDB, 2002.
    [107] Roy Goldman and Jennifer Widom. WSQ/DSQ: A Practical Approach for Combined Querying of Databases and the Web. In Proc. of SIGMOD, 2000.
    [108] Edith Cohen, Haim Kaplan, and Tova Milo. Labeling Dynamic XML Trees. In Proc. of PODS, 2002.
    [109] Stephen Alstrup and Theis Rauhe. Improved Labeling Scheme for Ancestor Queries.
    [110] Wei Wang, Haifeng Jiang, Hongjun Lu, and Jeffrey Xu Yu. Containment Join Size Estimation: Models and Methods. In Proc. of SIGMOD, 2003.
    [111] Roy Goldman, Jason McHugh, and Jennifer Widom. From Semistructured Data to XML: Migrating the Lore Data Model and Query Language. In Proc. of WebDB, 1999.
    [112] Wei Wang, Haifeng Jiang, Hongjun Lu, and Jeffrey Xu Yu. PBiTree Coding and Efficient Processing of Containment Joins. In Proc. of ICDE, 2003.
    [113] Sihem Amer-Yahia, Mary Fernandez, Divesh Srivastava, and Yu Xu. Phrase Matching in XML. In Proc. of VLDB, 2003.
    [114] Sudipto Guha, Nick Koudas, Divesh Srivastava, and Ting Yu. Index-Based Approximate XML Joins. In Proc. of SIGMOD, 2002.
    [115] Samuel DeFazio, Amjad Daoud, Lisa Ann Smith, and Jagannathan Srinivasan. Integrating IR and RDBMS Using Cooperative Indexing. In Proc. of SIGIR, 1995.
    [116] Paul Ogilvie and Jamie Callan. Hierarchical Language Models for XML Components Retrieval. In Proc. of INEX, 2004.
    [117] Paul Ogilvie and Jamie Callan. Language models and structured documents. In Proc. of INEX, 2002.
    [118] Ramakrishna Varadarajan and Vagelis Hristidis. A System for Query-Specific Document Summarization. In Proc. of CIKM, 2006.
    [119] Gautam Das, Vagelis Hristidis, Nishant Kapoor, and S. Sudarshan. Ordering the Attributes of Query Results. In Proc. of SIGMOD, 2006.
    [120] Audrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. Authority-Based Keyword Queries in Databases using ObjectRank. In Proc. of VLDB, 2004.
    [121] Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, and Gerhard Weikum. Probabilistic Ranking of Database Query Results. In Proc. of VLDB, 2004.
    [122] Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, and Gerhard Weikum. Probabilistic Information Retrieval Approach for Ranking of Database Query Results. TODS, 2006, Vol 31 No 3: 1134-1168.
    [123] Vagelis Hristidis and Yannis Papakonstantinou. Algorithms and Applications for answering Ranked Queries using Ranked Views. VLDB Journal, 2004, Vol 13 No 1.
    [124] Nishant Kapoor, Gautam Das, Vagelis Hristidis, S. Sudarshan, and Gerhard Weikum. STAR: A System for Tuple and Attribute Ranking of Query Answers. In Proc. of ICDE, 2007.
    [125] Ramakrishna Varadarajan, Vagelis Hristidis, and Tao Li. Searching the Web Using Composed Pages. In Proc. of SIGIR, 2006.
    [126] Jayavel Shanmugasundaram, Eugene Shekita, Jerry Kiernan, Rajasekar Krishnamurthy, Efstratios Viglas, Jeffrey Naughton, and Igor Tatarinov. A General Technique for Querying XML Documents using a Relational Database System. SIGMOD Record, 2001.
    [127] Sihem Amer-Yahia, Pat Case, Thomas Rolleke, Jayavel Shanmugasundaram, and Gerhard Weikum. Report on the DB/IR Panel at SIGMOD 2005. SIGMOD Record, 2005, Vol 34 No 4.
    [128] Chavdar Botev and Jayavel Shanmugasundaram. Context-Sensitive Keyword Search and Ranking for XML. In Proc. of WebDB, 2005.
    [129] Chavdar Botev, Sihem Amer-Yahia, and Jayavel Shanmugasundaram. Expressiveness and Performance of Full-Text Search Languages. In Proc. of EDBT, 2006.
    [130] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999.
    [131] Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999.
    [132] Jeffrey D. Ullman. Principles of Database Systems, 2nd Edition. Computer Science Press, 1982.
    [133] Jeffrey D. Ullman. Principles of Databases and Knowledge-Base Systems, Vol II. Computer Science Press, 1989.
    [134] 路燕，张亮，汪卫等．一种新的XML文档编码机制．计算机研究与发展，2004，41(3)：500～503．
    [135] 庞引明．基于结构化联结的XML查询模式匹配关键技术研究．博士论文，复旦大学．2004．
    [136] 李晓明，闫宏飞，王继民．搜索引擎—原理、技术与系统．科学出版社，2005．
    [137] 徐宝文，张卫丰．搜索引擎与信息获取技术．清华大学出版社，2003．
    [138] World Wide Web Consortium. http://www.w3.org
    [139] DBLP XML Records. http://dblp.uni-trier.de/xml/dblp.xml.gz