XML数据实体同一性相关技术的研究

英文题名：Research on the Techniques of Entity Identity on XML Data
作者：刘显敏
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：实体同一性 ; XML数据 ; 实体抽取 ; 实体匹配 ; 消解算法
英文关键词：entity identity ; XML data ; entity extraction ; entity matching ; resolving algo-
英文关键词：rithm
学位年度：2013
导师：李建中
学科代码：081202
学位授予单位：哈尔滨工业大学
论文提交日期：2013-07-01

摘要

近年来，劣质数据广泛出现于信息社会的各个领域，引发了很多问题并带来了巨大损失。关注该问题的数据可用性研究在国内外已经展开。实体同一性是数据可用性的重要维度之一。实体同一性基于数据库中存储的数据实体和现实世界中的物理实体定义。一个数据实体描述的是某个物理实体，是其在数据库中的表示形式；一个数据库被称作是满足实体同一性要求的，当且仅当数据库中没有任何两个数据实体描述的是同一个物理实体。对数据实体同一性的研究是当前数据处理领域的热点研究问题之一。针对关系数据的实体同一性研究已经有很多工作，然而，其中大部分的理论和方法并不适用于非关系数据，并且很难扩展到非关系数据上。针对非关系数据的实体同一性研究工作还很少，尚处于起步阶段。
     本文针对一类广泛使用的非关系数据，即XML数据，以完善XML数据可用性的管理技术为目标，从XML实体抽取、XML实体匹配以及实体匹配结果消解等问题切入，重点研究了XML数据实体同一性相关的技术。本文的主要工作可以概括如下：
     首先，本文提出并研究了XML数据上的实体抽取问题，提出了一种基于规则的实体自动抽取方法KEE。XML数据中没有实体的明显标识，且现有的实体同一性研究工作并没有考虑实体抽取问题，因此实体抽取是XML实体同一性研究的基础之一。提出的KEE方法利用XML查询描述实体，为实体提供了简洁的表示方法，避免了逐一表示XML实体的不便；允许用户利用键规则只描述感兴趣的少量实体，并自动地为用户寻找感兴趣的其它实体；利用查询松弛技术，克服了在异构数据上自动寻找相似实体时实体难以枚举、难以寻找的困难；基于自动机技术，利用共享中间计算结果的思想，实现了高效的XML数据实体抽取算法。从理论角度分析了KEE方法的性能，并用实验验证了该算法能够有效且高效地解决实体抽取问题。
     第二，本文研究了XML实体匹配问题，以提高XML实体匹配效率为目标，提出了基于哈希方法的XML实体匹配算法。给定实体间的相似函数，实体匹配是要找出所有相似函数值大于某个阈值的实体对，是检测数据实体同一性错误的重要基础。XML实体匹配要同时处理数据中结构和内容两部分信息，现有的技术仅关注内容之间的相似性，无法高效解决XML实体匹配问题。提出的基于局部敏感哈希的实体匹配方法把相似的实体以很高的概率映射到一个分组，仅两两计算同一分组内实体的相似函数值，大大提高了实体匹配的效率；考虑不同应用，抽象出三类实体相似函数定义，证明三类函数均具备局部敏感特性，并给出对应的哈希策略；基于三类函数的局部敏感哈希策略，给出了对应的实体匹配算法。从理论角度分析并证明了算法的有效性，并用实验验证了匹配算法的实际性能。
     第三，本文研究了实体匹配结果的消解问题，以求解更具意义的消解方式为目标，提出了两种形式化问题定义，并分别给出理论分析及算法。实体匹配的最终目标是解决实体同一性错误检测问题，即实体识别问题。将实体匹配结果转化为实体识别结果的问题就是实体匹配结果消解问题。本文基于融合多个匹配算法以及对不同实体对的匹配结果置信度不同的思想，形式化地定义了两种消解问题。针对最小化图代价的定义，从理论上证明了消解问题是NP完全问题；利用线性归约，给出近似算法；针对特殊情况，给出近似比更优的算法。针对最小化边权值的定义，证明该问题是NP完全问题；给出基于求解线性规划的近似算法；针对特殊情况，给出具有更优近似比的随机近似算法；针对大规模数据的情况，给出了四种启发式算法；用实验验证算法的性能，并对比不同算法的时间效率。
     最后，本文针对提出的实体抽取方法和匹配结果消解方法的不足，研究了两个相关的优化问题，分别从理论角度进行了分析。本文给出的实体抽取方法基于语义规则，但在某些应用中，用户无法给出规则。因此，本文以探究基于用户示例自动推断规则的可行性为目的，形式化地定义了XML查询学习问题。考虑四类不同查询，从理论角度分析其对应学习问题，对可解问题给出了多项式时间算法，对难解问题给出了复杂性证明并进一步分析其是否存在有效的近似算法。针对最小化图代价匹配结果消解问题，本文给出的近似算法的近似性能无法满足实际需求。因此，本文以近年来兴起的参数化复杂度理论为工具，研究该问题是否固定参数可解。给出了该问题的三种参数化定义，针对其中两个问题给出了有效的固定参数算法，并证明了另一个问题是固定参数难解的。
Recently, dirty data has been widely viewed in various fields of information society,caused many problems and made great economic loss. There have been many researcheforts on data usability in our country and abroad. Entity identity is defined based onthe concepts of data entity and real entity. A data entity describes some real entity and isthe representation of the real entity in database. A database satisfies the requirements ofentity identity, if and only if there are no two data entities describing the same real entity.Data usability is one of the most popular research problems in data processing area. Therehave been many works of entity identity on relational data, however, most of them canneither applied nor extended to non-relational data. In fact, there are only few works ondata usability of non-relational data.
     Focusing on XML data which is a kind of widely used non-relational data, to en-hance the techniques of data usability management, this thesis studies the problems ofXML entity extraction, XML entity matching and matching results resolution, and devel-ops the related techniques on XML entity identity. The main works can be summarizedas follows.
     First, the entity extraction problem on XML data is firstly defined and studied, and arule-based method KEE is proposed to extract entities from XML data automatically. InXML data, there are no obvious entities, and no previous works consider entity extractionproblem, therefore, it is one of the fundamental problems in XML entity identity research.The KEE method proposed utilizes queries to represent XML entities which is concise andavoids enumerating the entities one by one. Users are allowed to provide small amountsof entities using XML key rules, and KEE will find out other interesting entities. Usingquery relaxation techniques, KEE can extract entities correctly with the existing of hetero-geneous structures in XML data. Utilizing the idea of sharing computation, an automatonbased implementation is introduced. Theoretical analysis and experimental results showthat the proposed method can extract XML entities efectively and efciently.
     Second, the XML entity matching problem is studied, and, to improve the efciencyof entity matching procedure, a hash based method is proposed. Given the entity similar-ity function, entity matching problem aims to discover all entity pairs which are similarenough, and is one of the fundamentals in detecting errors of entity identity. For XML entity matching, both structural and content information should be considered, however,previous works only focus on content information and can not match XML entities prop-erly. Using locality sensitive hashing techniques, the method proposed partitions givenentities into groups such that similar entities will be put into the same group with highprobabilities, only computes similarity scores for entities within the same group, and im-proves the efciency of matching. For diferent application settings, three kinds of entitysimilarity measuring functions are designed, and they are proved to be locality-sensitive.Based on the corresponding locality-sensitive hashing functions, algorithms of matchingXML entities for the three kinds of functions are designed. Theoretical analysis and ex-perimental results show that the proposed method can match XML entities efectively andefciently.
     Third, the problem of resolving entity matching results is studied to discover mean-ingful resolutions. Two formal definitions of resolving problem are utilized and the cor-responding theoretical analysis and algorithms are provided. The goal of entity matchingis to solve entity identification problem. The resolving problem of entity matching resultsaims to transform entity matching results to entity identification results. Based on theideas of merging multiple matching algorithms and using extra information on matchingresults, two kinds of resolving problems are formally defined. For the first one, the re-solving problem is proved to be NP-complete, and an L-reduction based approximationalgorithm is designed. For special cases, an algorithm with better approximating ratio isgiven. For the second one, the problem is proved to be NP-complete, and an LP based ap-proximation algorithm is designed. For special cases, a randomized algorithm with betterapproximating ratio is given, and to process large XML data, four heuristic algorithms arealso designed. Experimental results show that the proposed algorithms are efcient andefective.
     Forth, considering the drawbacks of proposed methods for entity extraction and re-solving problem, two related optimization problems are studied. The rules used by entityextraction method are hard to obtain sometimes. Therefore, to investigate the feasibilityof automatic methods for discovering rules, XML query learning problem is formally de-fined. Considering four diferent query classes, the query learning problems are studiedfrom theoretical aspects. For tractable cases, polynomial time algorithms are given, whilefor intractable cases, the computational complexity analysis and approximability analysisare provided. Given the observation that the approximation algorithms for the first resolv- ing problem are not suitable for practical applications, using parameterized complexitytheory, it is studied that whether the problem is fixed-parameter tractable. Given threeparameterized versions, two of them are proved to be fixed-parameter tractable, and thelast one is proved to be fixed-parameter intractable.

引文

[1] Redman T C. The Impact of Poor Data Quality on the Typical Enterprise[J]. Com-mun. ACM,1998,41(2):79–82.
    [2] Miller Jr D W, Yeast J D, Evans R L. Missing Prenatal Records at a Birth Cen-ter: A Communication Problem Quantified[C]//AMIA Annual Symposium Pro-ceedings. Washington, DC, USA: American Medical Informatics Association,2005:535-539.
    [3] Codd E F. A Relational Model of Data for Large Shared Data Banks[J]. Commun.ACM,1970,13(6):377–387.
    [4] Fellegi I P, Holt D. A Systematic Approach to Automatic Edit and Imputation[J].Journal of the American Statistical Association,1976,71(353):17–35.
    [5] Wang R Y. A Product Perspective on Total Data Quality Management[J]. Commun.ACM,1998,41(2):58–65.
    [6] Batini C, Scannapieco M. Data Quality: Concepts, Methodologies and Techniques(Data-Centric Systems and Applications)[M]. Secaucus, NJ, USA: Springer-VerlagNew York, Inc.,2006.
    [7] Herna′ndez M A, Stolfo S J. The Merge/Purge Problem for LargeDatabases[C]//Proceedings of the1995ACM SIGMOD international conferenceon Management of data (SIGMOD’95). New York, NY, USA: ACM,1995:127–138.
    [8] Hernandez M A, Stolfo S J. Real-World Data is Dirty: Data Cleansing and theMerge/Purge Problem[J]. Data Mining and Knowledge Discovery,1998,2(1):9–37.
    [9] Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate Record Detection: ASurvey[J]. IEEE Trans. on Knowl. and Data Eng.,2007,19(1):1–16.
    [10] Brizan D G, Tansel A U. A Survey of Entity Resolution and Record LinkageMethodologies[J]. Communications of the IIMA,2006,6(3):41–50.
    [11] Koudas N, Sarawagi S, Srivastava D. Record Linkage: Similarity Measures andAlgorithms[C]//Proceedings of the2006ACM SIGMOD international conferenceon Management of data (SIGMOD’06). New York, NY, USA: ACM,2006:802–803.
    [12] Lim E P, Srivastava J, Prabhakar S, et al. Entity Identification in Database Integra-tion[C]//Proceedings of the Ninth International Conference on Data Engineering.Washington, DC, USA: IEEE Computer Society,1993:294–301.
    [13] Monge A, Elkan C. The Field Matching Problem: Algorithms and Applica-tions[C]//Proceedings of the second international Conference on Knowledge Dis-covery and Data Mining. Menlo Park, California: AAAI Press,1996:267–270.
    [14] Cohen W W. Data Integration Using Similarity Joins and a Word-Based Infor-mation Representation Language[J]. ACM Transactions on Information Systems(TOIS),2000,18(3):288–321.
    [15] Baxter R, Christen P, Churches T. A Comparison of Fast Blocking Methods forRecord Linkage[C]//ACM SIGKDD. NY, USA: ACM,2003:25–27.
    [16] Whang S E, Menestrina D, Koutrika G, et al. Entity Resolution with IterativeBlocking[C]//Proceedings of the2009ACM SIGMOD International Conferenceon Management of data (SIGMOD’09). New York, NY, USA: ACM,2009:219–232.
    [17] Benjelloun O, Garcia-Molina H, Menestrina D, et al. Swoosh: a Generic Approachto Entity Resolution[J]. The VLDB Journal,2009,18(1):255–276.
    [18] Chaudhuri S, Ganti V, Motwani R. Robust Identification of Fuzzy Dupli-cates[C]//Proceedings of the21st International Conference on Data Engineering(ICDE’05). Washington, DC, USA: IEEE Computer Society,2005:865–876.
    [19] Bohannon P, Fan W, Geerts F, et al. Conditional Functional Dependencies forData Cleaning[C]//Proceedings of the International Conference on Data Engineer-ing (ICDE). New York, NY, USA: IEEE,2007:746-755.
    [20] Cong G, Fan W, Geerts F, et al. Improving Data Quality: Consistency and Accu-racy[C]//Proceedings of the33rd international conference on Very large data bases(VLDB’07). Vienna, Austria: VLDB Endowment,2007:315–326.
    [21] Bravo L, Fan W, Ma S. Extending Dependencies with Conditions[C]//Proceedingsof the33rd international conference on Very large data bases (VLDB’07). Vienna,Austria: VLDB Endowment,2007:243–254.
    [22] Bravo L, Fan W, Geerts F, et al. Increasing the Expressivity of Conditional Func-tional Dependencies without Extra Complexity[C]//Proceedings of the2008IEEE24th International Conference on Data Engineering (ICDE’08). Washington, DC,USA: IEEE Computer Society,2008:516–525.
    [23] Fan W, Ma S, Hu Y, et al. Propagating Functional Dependencies with Condi-tions[J]. Proc. VLDB Endow.,2008,1(1):391–407.
    [24] Chiang F, Miller R J. Discovering Data Quality Rules[J]. Proc. VLDB Endow.,2008,1(1):1166–1177.
    [25] Fan W, Geerts F, Li J, et al. Discovering Conditional Functional Dependencies[J].IEEE Trans. on Knowl. and Data Eng.,2011,23(5):683–698.
    [26] Golab L, Karlof H, Korn F, et al. Sequential Dependencies[J]. Proc. VLDB En-dow.,2009,2(1):574–585.
    [27] Koudas N, Saha A, Srivastava D, et al. Metric Functional Dependen-cies[C]//Proceedings of the2009IEEE International Conference on Data Engi-neering (ICDE’09). Washington, DC, USA: IEEE Computer Society,2009:1275–1278.
    [28] Fan W. Dependencies Revisited for Improving Data Quality[C]//Proceedings ofthe twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principlesof database systems (PODS’08). New York, NY, USA: ACM,2008:159–170.
    [29] Korn F, Muthukrishnan S, Zhu Y. Checks and Balances: Monitoring Data QualityProblems in Network Trafc Databases[C]//Proceedings of the29th internationalconference on Very large data bases-Volume29(VLDB’03). Berlin, Germany:VLDB Endowment,2003:536–547.
    [30] Xiong H, Pandey G, Steinbach M, et al. Enhancing Data Analysis with NoiseRemoval[J]. IEEE Trans. Knowl. Data Eng.,2006,18(2):304–319.
    [31] Buneman P, Davidson S, Fan W, et al. Keys for XML[C]//Proceedings of the10thinternational conference on World Wide Web (WWW’01). New York, NY, USA:ACM,2001:201–210.
    [32] Fan W, Sime′on J. Integrity Constraints for XML[J]. Journal of Computer andSystem Sciences,2003,66(1):254–291.
    [33] Jefery S R, Garofalakis M, Franklin M J. Adaptive Cleaning for RFID DataStreams[C]//Proceedings of the32nd international conference on Very large databases (VLDB’06). Seoul, Korea: VLDB Endowment,2006:163–174.
    [34] Chen H, Ku W S, Wang H, et al. Leveraging Spatio-Temporal Redundancy forRFID Data Cleansing[C]//Proceedings of the2010ACM SIGMOD InternationalConference on Management of data (SIGMOD’10). New York, NY, USA: ACM,2010:51–62.
    [35] Borisov N, Babu S, Mandagere N, et al. Warding of the Dangers of Data Cor-ruption with Amulet[C]//Proceedings of the2011ACM SIGMOD InternationalConference on Management of data (SIGMOD’11). New York, NY, USA: ACM,2011:277–288.
    [36] Graefe G, Kuno H. Definition, Detection, and Recovery of Single-Page Failures, aFourth Class of Database Failures[J]. Proc. VLDB Endow.,2012,5(7):646–655.
    [37] van der Meyden R. Logics for Databases and Information Systems[M]. Norwell,MA, USA: Kluwer Academic Publishers,1998:307–356.
    [38] Imielin′ski T, Witold Lipski J. Incomplete Information in Relational Databases[J].Journal of the ACM (JACM),1984,31(4):761–791.
    [39] Vardi M. On the Integrity of Databases with Incomplete Informa-tion[C]//Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Princi-ples of database systems (PODS’86). New York, NY, USA: ACM,1986:252–266.
    [40] Gottlob G, Zicari R. Closed World Databases Opened Through Null Val-ues[C]//Proceedings of the14th International Conference on Very Large DataBases (VLDB’88). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1988:50–61.
    [41] Motro A. Integrity=Validity+Completeness[J]. ACM Transactions on DatabaseSystems (TODS),1989,14(4):480–502.
    [42] Levy A Y. Obtaining Complete Answers from IncompleteDatabases[C]//Proceedings of the22th International Conference on VeryLarge Data Bases (VLDB’96). San Francisco, CA, USA: Morgan KaufmannPublishers Inc.,1996:402–412.
    [43] Fan W, Geerts F. Relative Information Completeness[C]//Proceedings of thetwenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles ofdatabase systems (PODS’09). New York, NY, USA: ACM,2009:97–106.
    [44] Fan W, Geerts F. Capturing Missing Tuples and Missing Values[C]//Proceedingsof the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principlesof database systems (PODS’10). New York, NY, USA: ACM,2010:169–178.
    [45] Abiteboul S, Segoufin L, Vianu V. Representing and Querying XML with In-complete Information[J]. ACM Transactions on Database Systems (TODS),2006,31(1):208–254.
    [46] Barcelo P, Libkin L, Poggi A, et al. XML with Incomplete Information[J]. Journalof the ACM,2010,58(1):210–232.
    [47] Cheng R, Chen J, Xie X. Cleaning Uncertain Data with Quality Guarantees[J].Proc. VLDB Endow.,2008,1(1):722–735.
    [48] Chomicki J, Marcinkowski J. Minimal-Change Integrity Maintenance Using TupleDeletions[J]. Information and Computation,2005,197(1-2):90–121.
    [49] Schwalb E, Vila L. Temporal Constraints: A Survey[J]. Constraints,1998,3(2-3):129–149.
    [50] Zhang H, Diao Y, Immerman N. Recognizing Patterns in Streams with ImpreciseTimestamps[J]. Proc. VLDB Endow.,2010,3(1-2):244–255.
    [51] Cliford J, Dyreson C E, Isakowitz T, et al. On the Semantics of “Now” inDatabases[J]. ACM Trans. on Database Systems (TODS),1997,22(2):171–214.
    [52] Fan W, Geerts F, Wijsen J. Determining the Currency of Data[J]. ACM Trans.Database Syst.,2012,37(4):25:1–25:46.
    [53] Bright L, Raschid L. Using Latency-Recency Profiles for Data Delivery on theWeb[C]//Proceedings of the28th international conference on Very Large DataBases (VLDB’02). Hong Kong, China: VLDB Endowment,2002:550–561.
    [54] Cho J, Garcia-Molina H. Synchronizing a Database to Improve Freshness[J]. SIG-MOD Rec.,2000,29(2):117–128.
    [55] Newcombe H B, Kennedy J M, Axford S, et al. Automatic Linkage of VitalRecords[J]. Science,1959,130(3381):954–959.
    [56] Eellegi I P, Sunter A B. A Theory for Record Linkage[J]. J. Am. Statistical Asso-ciation,1969,64(328):1183–1210.
    [57] Arasu A, Chaudhuri S, Kaushik R. Transformation-based Framework for RecordMatching[C]//Proceedings of the2008IEEE24th International Conference onData Engineering (ICDE’08). Washington, DC, USA: IEEE Computer Society,2008:40–49.
    [58] Arasu A, Chaudhuri S, Kaushik R. Learning string transformations from exam-ples[J]. Proc. VLDB Endow.,2009,2(1):514–525.
    [59] Arasu A, Kaushik R. A Grammar-Based Entity Representation Framework forData Cleaning[C]//Proceedings of the2009ACM SIGMOD International Con-ference on Management of data (SIGMOD’09). New York, NY, USA: ACM,2009:233–244.
    [60] Arasu A, Re′C, Suciu D. Large-Scale Deduplication with Constraints Using Dedu-palog[C]//Proceedings of the2009IEEE International Conference on Data Engi-neering (ICDE’09). Washington, DC, USA: IEEE Computer Society,2009:952–963.
    [61] Fan W, Jia X, Li J, et al. Reasoning about record matching rules[J]. Proc. VLDBEndow.,2009,2(1):407–418.
    [62] Whang S E, Benjelloun O, Garcia-Molina H. Generic Entity Resolution with Neg-ative Rules[J]. The VLDB Journal,2009,18(6):1261–1277.
    [63] Whang S E, Garcia-Molina H. Entity Resolution with Evolving Rules[J]. Proc.VLDB Endow.,2010,3(1-2):1326–1337.
    [64] Chaudhuri S, Das Sarma A, Ganti V, et al. Leveraging Aggregate Constraintsfor Deduplication[C]//Proceedings of the2007ACM SIGMOD international con-ference on Management of data (SIGMOD’07). New York, NY, USA: ACM,2007:437–448.
    [65] Shen W, Li X, Doan A. Constraint-Based Entity Matching[C]//Proceedings of the20th national conference on Artificial intelligence-Volume2(AAAI’05). Pitts-burgh, Pennsylvania: AAAI Press,2005:862–867.
    [66] Fan W, Gao H, Jia X, et al. Dynamic Constraints for Record Matching[J]. TheVLDB Journal,2011,20(4):495–520.
    [67] Ananthakrishna R, Chaudhuri S, Ganti V. Eliminating Fuzzy Duplicates in DataWarehouses[C]//Proceedings of the28th international conference on Very LargeData Bases (VLDB’02). Hong Kong, China: VLDB Endowment,2002:586–597.
    [68] Guha S, Koudas N, Marathe A, et al. Merging the Results of Approximate MatchOperations[C]//Proceedings of the Thirtieth international conference on Very largedata bases-Volume30(VLDB’04). Toronto, Canada: VLDB Endowment,2004:636–647.
    [69] Chen Z, Kalashnikov D V, Mehrotra S. Adaptive Graphical Approach to EntityResolution[C]//Proceedings of the7th ACM/IEEE-CS joint conference on Digitallibraries (JCDL’07). New York, NY, USA: ACM,2007:204–213.
    [70] Singla P, Domingos P. Entity Resolution with Markov Logic[C]//Proceedings ofthe Sixth International Conference on Data Mining (ICDM’06). Washington, DC,USA: IEEE Computer Society,2006:572–582.
    [71] Koudas N, Marathe A, Srivastava D. Flexible String Matching against LargeDatabases in Practice[C]//Proceedings of the Thirtieth international conference onVery large data bases-Volume30(VLDB’04). Toronto, Canada: VLDB Endow-ment,2004:1078–1086.
    [72] Chaudhuri S, Ganti V, Kaushik R. A Primitive Operator for Similarity Joins in DataCleaning[C]//Proceedings of the22nd International Conference on Data Engineer-ing (ICDE’06). Washington, DC, USA: IEEE Computer Society,2006:5–16.
    [73] Behm A, Ji S, Li C, et al. Space-Constrained Gram-Based Indexing for EfcientApproximate String Search[C]//Proceedings of the2009IEEE International Con-ference on Data Engineering (ICDE’09). Washington, DC, USA: IEEE ComputerSociety,2009:604–615.
    [74] Papapetrou P, Athitsos V, Kollios G, et al. Reference-Based Alignment in LargeSequence Databases[J]. Proc. VLDB Endow.,2009,2(1):205–216.
    [75] Chaudhuri S, Ganti V, Xin D. Mining Document Collections to Facilitate AccurateApproximate Entity Matching[J]. Proc. VLDB Endow.,2009,2(1):395–406.
    [76] Xiao C, Wang W, Lin X, et al. Efcient Similarity Joins for Near Duplicate De-tection[C]//Proceedings of the17th international conference on World Wide Web(WWW’08). New York, NY, USA: ACM,2008:131–140.
    [77] Lieberman M D, Sankaranarayanan J, Samet H. A Fast Similarity Join AlgorithmUsing Graphics Processing Units[C]//Proceedings of the2008IEEE24th Interna-tional Conference on Data Engineering (ICDE’08). Washington, DC, USA: IEEEComputer Society,2008:1111–1120.
    [78] Yang X, Wang B, Li C. Cost-Based Variable-Length-Gram Selection for StringCollections to Support Approximate Queries Efciently[C]//Proceedings of the2008ACM SIGMOD international conference on Management of data (SIGMOD’08). New York, NY, USA: ACM,2008:353–364.
    [79] Li C, Wang B, Yang X. VGRAM: Improving Performance of Approximate Querieson String Collections using Variable-Length Grams[C]//Proceedings of the33rdinternational conference on Very large data bases (VLDB’07). Vienna, Austria:VLDB Endowment,2007:303–314.
    [80] Li C, Lu J, Lu Y. Efcient Merging and Filtering Algorithms for ApproximateString Searches[C]//Proceedings of the2008IEEE24th International Conferenceon Data Engineering (ICDE’08). Washington, DC, USA: IEEE Computer Society,2008:257–266.
    [81]王金宝,高宏,李建中, et al. RM树：一种支持字符串相似性操作的索引[J].计算机学报,2011,11(9):2124–2142.
    [82] Hassanzadeh O, Chiang F, Lee H C, et al. Framework for Evaluating Clustering Al-gorithms in Duplicate Detection[J]. Proc. VLDB Endow.,2009,2(1):1282–1293.
    [83] Hassanzadeh O, Miller R J. Creating Probabilistic Databases from DuplicatedData[J]. The VLDB Journal,2009,18:1141–1166.
    [84] Guo S, Dong X L, Srivastava D, et al. Record Linkage with Uniqueness Constraintsand Erroneous Values[J]. Proc. VLDB Endow.,2010,3(1-2):417–428.
    [85] Shu L, Long B, Meng W. A Latent Topic Model for Complete Entity Resolu-tion[C]//Proceedings of the2009IEEE International Conference on Data Engineer-ing (ICDE’09). Washington, DC, USA: IEEE Computer Society,2009:880–891.
    [86] Christen P. Automatic Record Linkage using Seeded Nearest Neighbour and Sup-port Vector Machine Classification[C]//Proceedings of the14th ACM SIGKDD in-ternational conference on Knowledge discovery and data mining (KDD’08). NewYork, NY, USA: ACM,2008:151–159.
    [87] Dong X, Halevy A, Madhavan J. Reference Reconciliation in Complex InformationSpaces[C]//Proceedings of the2005ACM SIGMOD international conference onManagement of data (SIGMOD’05). New York, NY, USA: ACM,2005:85–96.
    [88] Singla P, Domingos P. Collective Object Identification[C]//Proceedings of the19thinternational joint conference on Artificial intelligence (IJCAI’05). San Francisco,CA, USA: Morgan Kaufmann Publishers Inc.,2005:1636–1637.
    [89] Rastogi V, Dalvi N, Garofalakis M. Large-Scale Collective Entity Matching[J].Proc. VLDB Endow.,2011,4(4):208–218.
    [90] McCallum A, Nigam K, Ungar L H. Efcient Clustering of High-DimensionalData Sets with Application to Reference Matching[C]//Proceedings of the sixthACM SIGKDD international conference on Knowledge discovery and data mining(KDD’00). New York, NY, USA: ACM,2000:169–178.
    [91] Bilenko M, Kamath B, Mooney R J. Adaptive Blocking: Learning to Scale UpRecord Linkage[C]//Proceedings of the Sixth International Conference on DataMining (ICDM’06). Washington, DC, USA: IEEE Computer Society,2006:87–96.
    [92] Ko¨pcke H, Thor A, Rahm E. Evaluation of Entity Resolution Approaches on Real-World Match Problems[J]. Proc. VLDB Endow.,2010,3(1-2):484–493.
    [93] Ko¨pcke H, Thor A, Rahm E. Comparative Evaluation of Entity Resolution Ap-proaches with FEVER[J]. Proc. VLDB Endow.,2009,2(2):1574–1577.
    [94] Ko¨pcke H, Rahm E. Frameworks for Entity Matching: A Comparison[J]. DataKnowl. Eng.,2010,69(2):197–210.
    [95] Vernica R, Carey M J, Li C. Efcient Parallel Set-Similarity Joins using MapRe-duce[C]//Proceedings of the2010ACM SIGMOD International Conference onManagement of data (SIGMOD’10). New York, NY, USA: ACM,2010:495–506.
    [96] Kim H s, Lee D. HARRA: Fast Iterative Hashed Record Linkage for Large-ScaleData Collections[C]//Proceedings of the13th International Conference on Extend-ing Database Technology (EDBT’10). New York, NY, USA: ACM,2010:525–536.
    [97] Sarawagi S, Deshpande V S, Kasliwal S. Efcient Top-k Count Queries over Impre-cise Duplicates[C]//Proceedings of the12th International Conference on ExtendingDatabase Technology: Advances in Database Technology (EDBT’09). New York,NY, USA: ACM,2009:450–461.
    [98] Christen P. Development and User Experiences of an Open Source Data Clean-ing, deduplication and record linkage system[J]. SIGKDD Explor. Newsl.,2009,11(1):39–48.
    [99] Chaudhuri S, Chen B C, Ganti V, et al. Example-Driven Design of Efcient RecordMatching Queries[C]//Proceedings of the33rd international conference on Verylarge data bases (VLDB’07). Vienna, Austria: VLDB Endowment,2007:327–338.
    [100] Cohen W W, Ravikumar P D, Fienberg S E. A Comparison of String Distance Met-rics for Name-Matching Tasks[C]//IIWeb. San Francisco, USA: Morgan KaufmannPublishers,2003:73-78.
    [101] Neiling M, Jurk S, Lenz H J, et al. Object Identification Quality[C]//Proceedings ofthe International Workshop on Data Quality in Cooperative Information Systsems(DQCIS). Siena, Italy: Humboldt-Universita¨t zu Berlin,2003:1-10.
    [102] Menestrina D, Whang S E, Garcia-Molina H. Evaluating Entity Resolution Re-sults[J]. Proc. VLDB Endow.,2010,3(1-2):208–219.
    [103] Chen Z, Kalashnikov D V, Mehrotra S. Exploiting Context Analysis for CombiningMultiple Entity Resolution Systems[C]//Proceedings of the2009ACM SIGMODInternational Conference on Management of data (SIGMOD’09). New York, NY,USA: ACM,2009:207–218.
    [104] Augsten N, Bohlen M, Dyreson C, et al. Approximate Joins for Data-CentricXML[C]//Proceedings of the2008IEEE24th International Conference on DataEngineering (ICDE’08). Washington, DC, USA: IEEE Computer Society,2008:814–823.
    [105] Augsten N, Bo¨hlen M, Gamper J. Approximate Matching of Hierarchical Datausing pq-grams[C]//Proceedings of the31st international conference on Very largedata bases (VLDB’05). Trondheim, Norway: VLDB Endowment,2005:301–312.
    [106] Weis M, Naumann F. DogmatiX Tracks down Duplicates in XML[C]//Proceedingsof the2005ACM SIGMOD international conference on Management of data (SIG-MOD’05). New York, NY, USA: ACM,2005:431–442.
    [107] Flesca S, Manco G, Masciari E, et al. Fast Detection of XML Structural Similar-ity[J]. TKDE,2005,17(2):160–175.
    [108] Tatikonda S, Parthasarathy S. Hashing Tree-Structured Data: Methods and Appli-cations[C]//Data Engineering (ICDE),2010IEEE26th International Conferenceon. New York, NY, USA: IEEE,2010:429-440.
    [109]姜国华,姜守旭,王宏志, et al.标签劣质的XML数据上的查询处理[J].计算机科学与探索,2011,8(7):181–188.
    [110] Wang W, Xiao C, Lin X, et al. Efcient Approximate Entity Extraction with EditDistance Constraints[C]//Proceedings of the2009ACM SIGMOD InternationalConference on Management of data (SIGMOD’09). New York, NY, USA: ACM,2009:759–770.
    [111] Namata G M, Kok S, Getoor L. Collective Graph Identification[C]//Proceedingsof the17th ACM SIGKDD international conference on Knowledge discovery anddata mining (KDD’11). New York, NY, USA: ACM,2011:87–95.
    [112] Fan W, Li J, Ma S, et al. Graph Homomorphism Revisited for Graph Matching[J].Proc. VLDB Endow.,2010,3(1-2):1161–1172.
    [113] Fan W. Graph Pattern Matching Revised for Social Network Analy-sis[C]//Proceedings of the15th International Conference on Database Theory(ICDT’12). New York, NY, USA: ACM,2012:8–21.
    [114] Fan W, Li J, Luo J, et al. Incremental Graph Pattern Matching[C]//Proceedings ofthe2011ACM SIGMOD International Conference on Management of data (SIG-MOD’11). New York, NY, USA: ACM,2011:925–936.
    [115] Fan W, Li J, Wang X, et al. Query Preserving Graph Compression[C]//Proceedingsof the2012ACM SIGMOD International Conference on Management of Data(SIGMOD’12). New York, NY, USA: ACM,2012:157–168.
    [116] Ferreira Chaves L W, Buchmann E, Bo¨hm K. Finding Misplaced Items in Re-tail by Clustering RFID Data[C]//Proceedings of the13th International Conferenceon Extending Database Technology (EDBT’10). New York, NY, USA: ACM,2010:501–512.
    [117]苗东菁,石胜飞,李建中.一种局部相关不确定数据库快照集合上的概率频繁最近邻算法[J].计算机研究与发展,2011,10(17):1812–1823.
    [118] Fan W, Li J, Ma S, et al. Interaction between Record Matching and Data Re-pairing[C]//Proceedings of the2011ACM SIGMOD International Conference onManagement of data (SIGMOD’11). New York, NY, USA: ACM,2011:469–480.
    [119]王宏志,李建中,高宏.一种非清洁数据库的数据模型[J].软件学报,2012,3(8):539–550.
    [120] Deutsch A, Fernandez M, Florescu D, et al. A Query Language forXML[C]//Proceedings of the eighth international conference on World Wide Web(WWW’99). New York, NY, USA: Elsevier North-Holland, Inc.,1999:1155–1169.
    [121] Gonzalez G, Tari L, Gitter A, et al. Integrating Knowledge from Biomedical Lit-erature: Normalization and Evidence Statements for Interactions[C]//Proceedingsof the Second BioCreative Challenge Evaluation Workshop. New York, NY, USA:Springer-Verlag New York, Inc.,2007:227-236.
    [122] Jagadish H V, Lakshmanan L V S, Milo T, et al. Querying Network Directo-ries[C]//Proceedings of the1999ACM SIGMOD international conference on Man-agement of data (SIGMOD’99). New York, NY, USA: ACM,1999:133–144.
    [123] Amer-Yahia S, Cho S, Lakshmanan L V S, et al. Tree Pattern Query Minimiza-tion[J]. The VLDB Journal,2002,11(4):315–331.
    [124] Termehchy A, Winslett M. Using Structural Information in XML Keyword SearchEfectively[J]. TODS,2011,36(1).
    [125] Yu B, Li G, Sollins K, et al. Efective Keyword-Based Selection of RelationalDatabases[C]//Proceedings of the2007ACM SIGMOD international conferenceon Management of data (SIGMOD’07). New York, NY, USA: ACM,2007:139–150.
    [126] Sun C, Chan C Y, Goenka A K. Multiway SLCA-Based Keyword Search in XMLData[C]//Proceedings of the16th international conference on World Wide Web(WWW’07). New York, NY, USA: ACM,2007:1043–1052.
    [127] Amer-Yahia S, Cho S, Srivastava D. Tree Pattern Relaxation[C]//Proceedings ofthe8th International Conference on Extending Database Technology: Advances inDatabase Technology (EDBT’02). London, UK, UK: Springer-Verlag,2002:496–513.
    [128] Amer-Yahia S, Lakshmanan L V S, Pandit S. FleXPath: Flexible Structure andFull-Text Querying for XML[C]//Proceedings of the2004ACM SIGMOD inter-national conference on Management of data (SIGMOD’04). New York, NY, USA:ACM,2004:83–94.
    [129] Diao Y, Altinel M, Franklin M J, et al. Path Sharing and Predicate Evaluationfor High-Performance XML Filtering[J]. ACM Transactions on Database Systems(TODS),2003,28(4):467–516.
    [130] Sarawagi S, Bhamidipaty A. Interactive Deduplication using Active Learn-ing[C]//Proceedings of the eighth ACM SIGKDD international conference onKnowledge discovery and data mining (KDD’02). New York, NY, USA: ACM,2002:269–278.
    [131] Cochinwala M, Kurien V, Lalk G, et al. Efcient Data Reconciliation[J]. Infroma-tion Science,2001,137(1-4):1–15.
    [132] Milano D, Scannapieco M, Catarci T. Structure-Aware XML Object Identifica-tion[J]. Bulletin of the Technical Committee on Data Engineering,2006,29(2).
    [133] Puhlmann S, Weis M, Naumann F. XML Duplicate Detection using Sorted Neigh-borhoods[C]//Proceedings of the10th international conference on Advances inDatabase Technology (EDBT’06). Berlin, Heidelberg: Springer-Verlag,2006:773–791.
    [134] Leita o L, Calado P, Weis M. Structure-Based Inference of XML Similarity forFuzzy Duplicate Detection[C]//Proceedings of the sixteenth ACM conference onConference on information and knowledge management (CIKM’07). New York,NY, USA: ACM,2007:293–302.
    [135] Lee K H, Whang K Y, Han W S, et al. Structural Consistency: Enabling XML Key-word Search to Eliminate Spurious Results Consistently[J]. The VLDB Journal,2010,19:503–529.
    [136] Ji S, Li G, Li C, et al. Efcient Interactive Fuzzy Keyword Search[C]//Proceedingsof the18th international conference on World wide web (WWW’09). New York,NY, USA: ACM,2009:371–380.
    [137] Spaccapietra S, Parent C. View Integration: A Step Forward in Solving StructuralConflicts[J]. TKDE,1994,6(2):258–274.
    [138] L. Palopol D S, Ursino D. Semi-Automatic, Semantic Discovery of Propertiesfrom Database Schemes[C]//Proceedings of the1998International Symposium onDatabase Engineering&Applications (IDEAS’98). Washington, DC, USA: IEEEComputer Society,1998:244–.
    [139] Fazzinga B, Flesca S, Furfaro F. XPath Query Relaxation through RewritingRules[J]. IEEE Transactions on Knowledge and Engineering,2011,23(10):1583–1600.
    [140] Kanza Y, Sagiv Y. Flexible Queries over Semistructured Data[C]//Proceedingsof the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles ofdatabase systems (PODS’01). New York, NY, USA: ACM,2001:40–51.
    [141] Koloniari G, Pitoura E. Distributed Structural Relaxation of XPathQueries[C]//Proceedings of the2009IEEE International Conference on Data Engi-neering (ICDE’09). Washington, DC, USA: IEEE Computer Society,2009:529–540.
    [142] Chan C Y, Felber P, Garofalakis M, et al. Efcient Filtering of XML Documentswith XPath Expressions[J]. The VLDB Journal,2002,11(4):354–379.
    [143] Candan K S, Hsiung W P, Chen S, et al. AFilter: Adaptable XML Filtering withPrefix-Caching Sufx-Clustering[C]//Proceedings of the32nd international con-ference on Very large data bases (VLDB’06). Seoul, Korea: VLDB Endowment,2006:559–570.
    [144] Silvasti P, Sippu S, Soisalon-Soininen E. XML-Document-Filtering Automaton[J].Proc. VLDB Endow.,2008,1(2):1666–1671.
    [145] Gionis A, Indyk P, Motwani R. Similarity Search in High Dimensions via Hash-ing[C]//Proceedings of the25th International Conference on Very Large DataBases (VLDB’99). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1999:518–529.
    [146] Bille P. A Survey on Tree Edit Distance and Related Problems[J]. Theor. Comput.Sci.,2005,337(1-3):217–239.
    [147] Ristad E S, Yianilos P N. Learning String-Edit Distance[J]. IEEE Trans. PatternAnal. Mach. Intell.,1998,20(5):522–532.
    [148] Andoni A, Krauthgamer R. The Computational Hardness of Estimating Edit Dis-tance [Extended Abstract][C]//Proceedings of the48th Annual IEEE Symposiumon Foundations of Computer Science (FOCS’07). Washington, DC, USA: IEEEComputer Society,2007:724–734.
    [149] Gravano L, Ipeirotis P G, Jagadish H V, et al. Approximate String Joins in aDatabase (Almost) for Free[C]//Proceedings of the27th International Conferenceon Very Large Data Bases (VLDB’01). San Francisco, CA, USA: Morgan Kauf-mann Publishers Inc.,2001:491–500.
    [150] Sutinen E, Tarhio J. On Using q-Gram Locations in Approximate String Match-ing[C]//Proceedings of the Third Annual European Symposium on Algorithms(ESA’95). London, UK, UK: Springer-Verlag,1995:327–340.
    [151] Charikar M S. Similarity Estimation Techniques from Rounding Algo-rithms[C]//Proceedings of the thiry-fourth annual ACM symposium on Theory ofcomputing (STOC’02). New York, NY, USA: ACM,2002:380–388.
    [152] Broder A Z, Charikar M, Frieze A M, et al. Min-Wise Independent Permutations(Extended Abstract)[C]//Proceedings of the thirtieth annual ACM symposium onTheory of computing (STOC’98). New York, NY, USA: ACM,1998:327–336.
    [153] Tekli J, Chbeir R. A Novel XML Document Structure Comparison Frameworkbased-on Sub-Tree Commonalities and Label Semantics[J]. Web Semant.,2012,11:14–40.
    [154] Augsten N, Barbosa D, Bohlen M, et al. Efcient Top-k Approximate SubtreeMatching in Small Memory[J]. IEEE Trans. on Knowl. and Data Eng.,2011,23(8):1123–1137.
    [155] Tai K C. The Tree-to-Tree Correction Problem[J]. J. ACM,1979,26(3):422–433.
    [156] Zhang K, Shasha D. Simple Fast Algorithms for the Editing Distance BetweenTrees and Related Problems[J]. SIAM Journal on Computing,1989,18(6):1245–1262.
    [157] Tian Y, Patel J M. TALE: A Tool for Approximate Large Graph Match-ing[C]//Proceedings of the2008IEEE24th International Conference on Data En-gineering (ICDE’08). Washington, DC, USA: IEEE Computer Society,2008:963–972.
    [158] Garofalakis M, Kumar A. XML Stream Processing using Tree-Edit Distance Em-beddings[J]. ACM Trans. Database Syst.,2005,30(1):279–332.
    [159] Calado P, Herschel M, Leita¨o L. An Overview of XML Duplicate Detection Al-gorithms[M]. Studies in Fuzziness and Soft Computing, vol.255. New York, NY,USA: Springer Berlin/Heidelberg,2010:193–224.
    [160] Dorneles C F, Goncalves R, dos Santos Mello R. Approximate Data InstanceMatching: A Survey[J]. Knowl. Inf. Syst.,2011,27(1):1–21.
    [161] Bansal N, Blum A, Chawla S. Correlation Clustering[J]. Machine Learning,2004,56(1-3):89–113.
    [162] Soon W M, Ng H T, Lim D C Y. A Machine Learning Approach to CoreferenceResolution of Noun Phrases[J]. Computational Linguistics,2001,27(4):521–544.
    [163] Malioutov I, Barzilay R. Minimum Cut Model for Spoken Lecture Segmenta-tion[C]//Proceedings of the21st International Conference on Computational Lin-guistics and the44th annual meeting of the Association for Computational Linguis-tics (ACL-44). Stroudsburg, PA, USA: Association for Computational Linguistics,2006:25–32.
    [164] Ailon N, Charikar M, Newman A. Aggregating Inconsistent Information: Rankingand Clustering[J]. Journal of ACM,2008,55(5):23:1–23:27.
    [165] Charikar M, Guruswami V, Wirth A. Clustering with Qualitative Information[J].Journal of Computer and System Sciences,2005,71(3):360–383.
    [166] Demaine E D, Emanuel D, Fiat A, et al. Correlation Clustering in General WeightedGraphs[J]. Theoretical Computer Science,2006,361(2):172–187.
    [167] Fellows M R, Guo J, Kanj I. The Parameterized Complexity of Some MinimumLabel Problems[J]. Journal of Computer and System Sciences,2010,76:727–740.
    [168] Bru¨Ggemann T, Monnot J, Woeginger G J. Local Search for the Minimum La-bel Spanning Tree Problem with Bounded Color Classes[J]. Operations ResearchLetters,2003,31(3):195–201.
    [169] Hassin R, Monnot J, Segev D. Approximation Algorithms and Hardness Results forLabeled Connectivity Problems[J]. Journal of Combinatorial Optimization,2007,14:437–453.
    [170] Broersma H, Li X, Woeginger G, et al. Paths and Cycles in Colored Graphs[J].Australasian J. Combin.,2005,31:299–311.
    [171] Carr R D, Doddi S, Konjevod G, et al. On the Red-Blue Set Cover Prob-lem[C]//Proceedings of the eleventh annual ACM-SIAM symposium on Discretealgorithms (SODA’00). Philadelphia, PA, USA: Society for Industrial and AppliedMathematics,2000:345–353.
    [172] Jha S, Sheyner O, Wing J. Two Formal Analys s of Attack Graphs[C]//Proceedingsof the15th IEEE workshop on Computer Security Foundations (CSFW’02). Wash-ington, DC, USA: IEEE Computer Society,2002:49–.
    [173] Zhang P, Cai J Y, Tang L Q, et al. Approximation and Hardness Results for La-bel Cut and Related Problems[J]. Journal of Combinatorial Optimization,2011,21:192–208.
    [174] Garey M R, Johnson D S. Computers and Intractability: A Guide to the Theory ofNP-Completeness[M]. New York, NY, USA: W. H. Freeman&Co.,1990:187–285.
    [175] Papadimitriou C, Yannakakis M. Optimization, Approximation, and ComplexityClasses[C]//Proceedings of the twentieth annual ACM symposium on Theory ofcomputing (STOC’88). New York, NY, USA: ACM,1988:229–234.
    [176] Elsner M, Schudy W. Bounding and Comparing Methods for Correlation Cluster-ing Beyond ILP[C]//Integer Linear Programming for Natural Language Processing.Stroudsburg, PA, USA: Association for Computational Linguistics,2009:19-27.
    [177] Staworko S, Wieczorek P. Learning Twig and Path Queries[C]//Proceedings of the15th International Conference on Database Theory (ICDT’12). New York, NY,USA: ACM,2012:140–154.
    [178] Zuckerman D. Linear Degree Extractors and the Inapproximability of Max Cliqueand Chromatic Number[C]//Proceedings of the thirty-eighth annual ACM sympo-sium on Theory of computing (STOC’06). New York, NY, USA: ACM,2006:681–690.
    [179] Miyano S, Shinohara A, Shinohara T. Polynomial-Time Learning of ElementaryFormal Systems[J]. New Generation Computing,2000,18(3):217–242.
    [180] Gold E M. Language Identification in the Limit[J]. Information and Control,1967,10(5):447–474.
    [181] Angluin D. Inductive Inference of Formal Languages from Positive Data[J]. Infor-mation and Control,1980,45(2):117–135.
    [182] Angluin D. Learning Regular Sets from Queries and Counterexamples[J]. Infor-mation and Computation,1987,75:87–106.
    [183] Higuera C d l. Characteristic Sets for Polynomial Grammatical Inference[J]. Ma-chine Learning,1997,27(2):125–138.
    [184] Angluin D. Negative Results for Equivalence Queries[J]. Machine Learning,1990,5(2):121–150.
    [185] Carme J, Ceresna M, Goebel M. Query-Based Learning of XPath Expres-sions[C]//Proceedings of the8th international conference on Grammatical Infer-ence: algorithms and applications (ICGI’06). Berlin, Heidelberg: Springer-Verlag,2006:342–343.
    [186] Carme J, Gilleron R, Lemay A, et al. Interactive Learning of Node Selecting TreeTransducer[J]. Machine Learning,2007,66(1):33–67.
    [187] Lemay A, Niehren J, Gilleron R. Learning n-ary Node Selecting Tree Transduc-ers from Completely Annotated Examples[C]//Proceedings of the8th internationalconference on Grammatical Inference: algorithms and applications (ICGI’06).Berlin, Heidelberg: Springer-Verlag,2006:253–267.
    [188] Bex G J, Neven F, Schwentick T, et al. Inference of Concise Regular Expressionsand DTDs[J]. ACM Transactions on Database Systems (TODS),2010,35(2):11:1–11:47.
    [189] Das Sarma A, Parameswaran A, Garcia-Molina H, et al. Synthesizing View Defini-tions from Data[C]//Proceedings of the13th International Conference on DatabaseTheory (ICDT’10). New York, NY, USA: ACM,2010:89–103.
    [190] Downey R G, Fellows M R. Fixed-Parameter Tractability and Completeness I:Basic Results[J]. SIAM Journal of Computing,1995,24(4):873–921.
    [191] Downey R G, Fellows M R. Fixed-Parameter Tractability and Completeness II: OnCompleteness for W[1][J]. Theoretical Computer Science,1995,141(1&2):109–131.
    [192] Flum J, Grohe M. Parameterized Complexity Theory[M]. New York, NY, USA:Springer,2006:33–37.
    [193] Downey R, Fellows M. Complexity Theory[M]. New York, NY, USA: CambridgeUniversity Press,1993:191–225.
    [194] Grohe M. Parameterized Complexity for the Database Theorist[J]. ACM SIGMODRecord,2002,31(4):86–96.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700