- Common Lisp
Jörgen Brandt is funded by the EU project BiobankCloud.
In many scientific areas, e.g., bioinformatics, the increase of data volume as well as workflow complexity necessitates workflow languages taylored towards parallelism and software integration.
While data-parallel dataflow systems like Hadoop, Spark, or Flink can scale to a large number of nodes, scientific worklfow systems like KNIME or Galaxy integrate arbitrary software, including command line tools or libraries with R or Python interfaces.
How general can a workflow language that focuses on parallelism and integration be? Can it host conditionals, compound data structures, and unbouned iteration? How should such workflows be composed from smaller parts?
Cuneiform is a workflow specification language which makes it easy to integrate heterogeneous tools and libraries and exploit data parallelism. Users do not have to create heavy-weight wrappers for establised tools or to reimplement them. Instead, they apply their existing software to partitioned data. Using the Hi-WAY application master Cuneiform can be executed on Hadoop YARN which makes it suitable for large scale data analysis.
Cuneiform comes in the form of a functional programming language with a Foreign Function Interface (FFI) that lets users create functions in any suitable scripting language and apply these functions in a uniform way.
Brandt, J., Bux, M. and Leser, U. (2015). Cuneiform: A Functional Language for Large Scale Scientific Data Analysis. Algorithms and Systems for MapReduce and Beyond (BeyondMR2015). Brussels, Belgium.