Declarative Rules for Inferring Fine-Grained Data Provenance from Scientific Workflow Execution Traces
详细信息    查看全文
  • 作者:Shawn Bowers (18)
    Timothy McPhillips (19)
    Bertram Lud?scher (20)
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2012
  • 出版时间:2012
  • 年:2012
  • 卷:7525
  • 期:1
  • 页码:97-110
  • 全文大小:262KB
  • 参考文献:1. Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance Collection Support in the Kepler Scientific Workflow System. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol.?4145, pp. 118-32. Springer, Heidelberg (2006) CrossRef
    2. Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: Enabling database-style workflow provenance. PVLDB 5(4) (2011)
    3. Anand, M.K., Bowers, S., McPhillips, T.M., Lud?scher, B.: Efficient provenance storage over nested data collections. In: EDBT (2009)
    4. Bowers, S., McPhillips, T., Riddle, S., Anand, M.K., Lud?scher, B.: Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol.?5272, pp. 70-7. Springer, Heidelberg (2008) CrossRef
    5. Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: SIGMOD (2008)
    6. Gil, Y., et al.: Examining the challenges of scientific workflows. IEEE Computer?40(12), 24-2 (2007) CrossRef
    7. Lee, E., Parks, T.: Dataflow process networks. Proc. of the IEEE?83(5), 773-99 (1995) CrossRef
    8. Lim, C., Lu, S., Chebotko, A., Fotouhi, F.: Prospective and retrospective provenance collection in scientific workflow environments. In: IEEE SCC, pp. 449-56 (2010)
    9. Lud?scher, B., Podhorszki, N., Altintas, I., Bowers, S., McPhillips, T.M.: From computation models to models of provenance: the rws approach. Concurrency and Computation: Practice and Experience?20(5), 507-18 (2008) CrossRef
    10. Lud?scher, B., et al.: Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18(10) (2006)
    11. McPhillips, T., Bowers, S., Zinn, D., Lud?scher, B.: Scientific workflow design for mere mortals. Future Generation Computer Systems 25(5) (2009)
    12. Misra, A., Blount, M., Kementsietsidis, A., Sow, D., Wang, M.: Advances and Challenges for Scalable Provenance in Stream Processing Systems. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol.?5272, pp. 253-65. Springer, Heidelberg (2008) CrossRef
    13. Missier, P., Paton, N., Belhajjame, K.: Fine-grained and efficient lineage querying of collection-based workflow provenance. In: EDBT (2010)
    14. Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, Reloaded. In: Gertz, M., Lud?scher, B. (eds.) SSDBM 2010. LNCS, vol.?6187, pp. 471-81. Springer, Heidelberg (2010) CrossRef
    15. Moreau, L., et al.: The open provenance model core specification (v1.1). Future Generation Computer Systems?27(6), 743-56 (2011) CrossRef
    16. RestFlow, https://sites.google.com/site/restflowdocs/
    17. Simmhan, Y.L., et al.: A survey of data provenance in e-science. SIGMOD Record 34(3) (2005)
    18. The W3C Provenance Working Group, http://www.w3.org/2011/prov
  • 作者单位:Shawn Bowers (18)
    Timothy McPhillips (19)
    Bertram Lud?scher (20)

    18. Dept. of Computer Science, Gonzaga University, USA
    19. Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Stanford University, USA
    20. Dept. of Computer Science, University of California Davis, USA
文摘
Fine-grained dependencies within scientific workflow provenance specify lineage relationships between a workflow result and the input data, intermediate data, and computation steps used in the result’s derivation. This information is often needed to determine the quality and validity of scientific data, and as such, plays a key role in both provenance standardization efforts and provenance query frameworks. While most scientific workflow systems can record basic information concerning the execution of a workflow, they typically fall into one of three categories with respect to recording dependencies: (1) they rely on workflow computation steps to declare dependency relationships at runtime; (2) they impose implicit assumptions concerning dependency patterns from which dependencies are automatically inferred; or (3) they do not assert any dependency information at all. We present an alternative approach that decouples dependency inference from workflow systems and underlying execution traces. In particular, we present a high-level declarative language for expressing explicit dependency rules that can be applied (at any time) to workflow trace events to generate fine-grained dependency information. This approach not only makes provenance dependency rules explicit, but allows rules to be specified and refined by different users as needed. We present our dependency rule language and implementation that rewrites dependency rules into relational queries over underlying workflow traces. We also demonstrate the language using common types of dependency patterns found within scientific workflows.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700