DB-IR integration using tight-coupling in the Odysseus DBMS
详细信息    查看全文
  • 作者:Kyu-Young Whang ; Jae-Gil Lee ; Min-Jae Lee ; Wook-Shin Han ; Min-Soo Kim…
  • 关键词:Tight ; coupling ; Information retrieval ; DB ; IR integration ; Odysseus
  • 刊名:World Wide Web
  • 出版年:2015
  • 出版时间:May 2015
  • 年:2015
  • 卷:18
  • 期:3
  • 页码:491-520
  • 全文大小:1,975 KB
  • 参考文献:1.Abiteboul, S., et al.: The Lowell database research self-assessment. Commun. ACM 48(5), 111鈥?18 (2005)View Article
    2.Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: a system for keyword-based search over relational databases. In: ICDE, pp. 5鈥?6 (2002)
    3.Agrawal, R., et al.: The Claremont report on database research. ACM SIGMOD Rec. 37(3), 9鈥?9 (2008)View Article
    4.Apache Lucene: http://鈥媗ucene.鈥媋pache.鈥媜rg/鈥?/span> (2013). Accessed 22 Nov 2013
    5.Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press/Addison-Wesley (1999)
    6.Baeza-Yates, R.A., Consens, M.P.: The continued saga of DB-IR integration. In: VLDB (2004) (a tutorial)
    7.Banerjee, S., Krishnamurthy, V., Murthy, R.: All your data: the oracle extensibility architecture. Oracle White Paper. Oracle Corp. (1999)
    8.Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, pp. 2670鈥?676 (2007)
    9.Bast, H., Weber, I.: The completeSearch engine: interactive, efficient, and towards IR & DB integration. In: CIDR, pp. 88鈥?5 (2007)
    10.Bast, H., Chitea, A., Suchanek, F.M., Weber, I.: ESTER: efficient search on text, entities, and relations. In: SIGIR, pp. 671鈥?78 (2007)
    11.Biliris, A.: The performance three database storage structures for managing large objects. In:聽SIGMOD, pp.聽276鈥?85 (1992)
    12.Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW, pp.聽107鈥?17 (1998)
    13.Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR technologies: what is the sound of one hand clapping. In: CIDR, pp. 1鈥?2 (2005)
    14.Chen, W., Chow, J., Fuh, Y., Grandbois, J., Jou, M., Mattos, N.M., Tran, B.T., Wang, Y.: High level indexing of user-defined types. In: VLDB, pp. 554鈥?64 (1999)
    15.Cheng, T., Chang, K.C.-C.: Beyond pages: supporting efficient, scalable entity search with dual-inversion index. In: EDBT, pp. 15鈥?6 (2010)
    16.Cornacchia, R., Heman, S., Zukowski, M., de Vries, A.P., Boncz, P.A.: Flexible and efficient IR using array databases. VLDB J. 17(1), 151鈥?68 (2008)View Article
    17.DeRose, P., Shen, W., Chen, F., Doan, A., Ramakrishnan, R.: Building structured web community portals: a top-down, compositional, and incremental approach. In: VLDB, pp. 399鈥?10 (2007)
    18.DeFazio, S., Daoud, A.M., Smith, L.A., Srinivasan, J., Croft, W.B., Callan, J.P.: Integrating IR and RDBMS using cooperative indexing. In: SIGIR, pp. 84鈥?2 (1995)
    19.Ewald, G., Hans-Jurgen, S.: PostgreSQL developer鈥檚 handbook. Sams Publishing (2001)
    20.Full-Text Search in PostgreSQL: http://鈥媤ww.鈥媝ostgresql.鈥媜rg/鈥媎ocs/鈥?.鈥?/鈥媠tatic/鈥媡extsearch.鈥媓tml (2013). Accessed 22 Nov 2013
    21.Fuh, Y., De脽loch, S., Chen, W., Mattos, N., Tran, B., Lindsay, B., DeMichel, L., Rielau, S., Mannhaupt, D.: Implementation of SQL3 structured types with inheritance and value substitutability. In: VLDB, pp. 565鈥?74 (1999)
    22.Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD, pp. 16鈥?7 (2003)
    23.Halverson, A., Burger, J., Galanis, L., Kini, A., Krishnamurthy, R., Rao, A.N., Tian, F., Viglas S., Wang, Y., Naughton, J.F., DeWitt, D.J.: Mixed mode XML query processing. In: VLDB, pp. 225鈥?36 (2003)
    24.Heman, S., Zukowski, M., de Vries, A.P., Boncz, P.A.: Efficient and flexible information retrieval using MonetDB/X100. In: CIDR, pp. 96鈥?01 (2007)
    25.Hristidis, V., Papakonstantinou, Y.: DISCOVER: keyword search in relational databases. In:聽VLDB, pp.聽670鈥?81 (2002)
    26.IBM: DB2 UDB Text Extender Administration and Programming Version 8 (2003)
    27.Lentz, A.: MySQL Storage Engine Architecture. MySQL Developer Articles. MySQL AB (2004) (available from http://鈥媎ev.鈥媘ysql.鈥媍om/鈥媡ech-resources/鈥媋rticles ). Accessed 22 Nov 2013
    28.Manning, C.D., Raghavan, P., Sch眉tze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
    29.McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, 2nd edn. Manning Publications (2010)
    30.Oracle: Oracle Data Cartridge Developer鈥檚 Guide 11g Release 1 (2008)
    31.Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697鈥?06 (2007)
    32.Theobald, M., et al.: TopX: Efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81鈥?15 (2008)View Article
    33.Tsearch2鈥擣ull Text Extension for PostgreSQL: http://鈥媤ww.鈥媠ai.鈥媘su.鈥媠u/鈥媬megera/鈥媝ostgres/鈥媑ist/鈥媡search/鈥媀2 (2013). Accessed 22 Nov 2013
    34.Weikum, G.: DB&IR: both sides now. In: SIGMOD, pp. 25鈥?0 (2007)
    35.Whang, K., Krishnamurthy, R.: The multilevel grid file鈥攁 dynamic hierarchical multidimensional file structure. In: DASFAA, pp. 449鈥?59 (1991)
    36.Whang, K., Park, B., Han, W., Lee, Y.: An inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems. U.S. Patent No. 6,349,308 (2002) (Appl. No. 09/250,487 (1999))
    37.Whang, K.: Tight-coupling: A way of building high-performance application specific engines. DASFAA (2003) (presented at the panel session, available on-line from http://鈥媤ww.鈥媎asfaa.鈥媜rg/鈥媎asfaa2003/鈥媐ile/鈥婸rof_鈥婯yu-Young_鈥媁hang_鈥?.鈥媝df ). Accessed 22 Nov 2013
    38.Whang, K., Lee, M., Lee, J., Kim, M., Han, W.: Odysseus: a high-performance ORDBMS tightly-coupled with IR features. In: ICDE, pp. 1104鈥?105 (2005) (this paper received the Best Demonstration Award)
    39.Whang, K.: A new DBMS architecture for DB-IR integration. In: APWeb/WAIM, pp. 4鈥? (2007) (a keynote presentation)
    40.Whang, K.: DB-IR integration and its application to a massively-parallel search engine. In:聽CIKM, pp. 1鈥? (2009) (a keynote presentation)
    41.Whang, K., Lee, J., Kim, M., Lee, M., Lee, K., Han, W., Kim, J.: Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance. GeoInformatica 14(4), 425鈥?46 (2010)View Article
    42.Whang, K., Yun, T., Yeo, Y., Song, I., Kwon, H., and Kim, I.: ODYS: an approach to building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS for higher-level functionality. In: SIGMOD, pp. 313鈥?24 (2013)
    43.Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd聽edn. Morgan Kaufmann Publishers (1999)
    44.Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 1鈥?6 (2006)View Article
  • 作者单位:Kyu-Young Whang (1)
    Jae-Gil Lee (2)
    Min-Jae Lee (1)
    Wook-Shin Han (3)
    Min-Soo Kim (4)
    Jun-Sung Kim (1)

    1. Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Korea
    2. Department of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Korea
    3. Department of Creative IT Engineering/Department of Computer Science and Engineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam-ro, Nam-gu, Pohang-si, Gyeongbuk, 790-784, Korea
    4. Department of Information and Communication Engineering, Daegu Gyeongbuk Institute of Science & Technology (DGIST), 333 Technojungang-daero, Hyeonpung-myeon, Dalseong-gun, Daegu, 711-873, Korea
  • 刊物类别:Computer Science
  • 刊物主题:Information Systems Applications and The Internet
    Database Management
    Operating Systems
  • 出版者:Springer Netherlands
  • ISSN:1573-1413
文摘
As many recent applications require integration of structured data and text data, unifying database (DB) and information retrieval (IR) technologies has become one of major challenges in our field. There have been active discussions on the system architecture for DB-IR integration, but a clear agreement has not been reached yet. Along this direction, we have advocated the use of the tight-coupling architecture and developed a novel structure of the IR index as well as tightly-coupled query processing algorithms. In tight-coupling, the text data type is supported from the storage system just like a built-in data type so that the query processor can efficiently handle queries involving both structured data and text data. In this paper, for archival purposes, we consolidate our achievements reported at non-regular publications over the last ten years or so, extending them by adding greater details on the IR index and the query processing algorithms. All the features in this paper are fully implemented in the Odysseus DBMS that has been under development at KAIST for over 23 years. We show that Odysseus significantly outperforms two open-source DBMSs and one open-source search engine (with some exceptional cases) in processing DB-IR integration queries. These results indeed demonstrate superiority of the tight-coupling architecture for DB-IR integration.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700