SAASFEE: Scalable Scientific Workflow Execution EngineShow others and affiliations
2015 (English)In: Proceedings of the VLDB Endowment, 2015, 10, Vol. 8, p. 1892-1903Conference paper, Published paper (Refereed)
Abstract [en]
Across many fields of science, primary data sets like sensor read-outs, time series, and genomic sequences are analyzed by complex chains of specialized tools and scripts exchanging intermediate results in domain-specific file formats. Scientific workflow management systems (SWfMSs) support the development and execution of these tool chains by providing workflow specification languages, graphical editors, fault-tolerant execution engines, etc. However, many SWfMSs are not prepared to handle large data sets because of inadequate support for distributed computing. On the other hand, most SWfMSs that do support distributed computing only allow static task execution orders. We present SAASFEE, a SWfMS which runs arbitrarily complex workflows on Hadoop YARN. Workflows are specified in Cuneiform, a functional workflow language focusing on parallelization and easy integration of existing software. Cuneiform workflows are executed on Hi-WAY, a higher-level scheduler for running workflows on YARN. Distinct features of SAASFEE are the ability to execute iterative workflows, an adaptive task scheduler, re-executable provenance traces, and compatibility to selected other workflow systems. In the demonstration, we present all components of SAASFEE using real-life workflows from the field of genomics.
Place, publisher, year, edition, pages
2015, 10. Vol. 8, p. 1892-1903
Series
Proceedings of the VLDB Endowment, ISSN 2150-8097 ; 8
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-24506DOI: 10.14778/2824032.2824094Scopus ID: 2-s2.0-84953879839OAI: oai:DiVA.org:ri-24506DiVA, id: diva2:1043590
Conference
41st International Conference on Very Large Data Bases, August 31 - September 4, 2015, Kohala Coast, US
2016-10-312016-10-312023-05-22Bibliographically approved