Dataflow systems are widely used today for building and running continuous data-intensive applications. However, the unavoidable semantic gap between the host languages of dataflow system libraries and the dataflow model creates programmability limitations that hinder performance, safety, and ease of use. We propose AquaLang, a new language designed for dataflow systems. Programs in AquaLang blend strongly typed relational and functional syntax and are verified using an effect system that prevents undefined behaviour that can occur when introducing user-defined logic that violates dataflow semantics. Unverified external code is also feasible in AquaLang through the novel use of sandboxing. Furthermore, on top of standard dataflow optimisations employed by current systems, AquaLang’s ability to analyze algebraic properties of user-defined functions further unlocks the potential of deeper dataflow program re-writing. In our evaluation, we measure up to one order of magnitude speedup for Nexmark queries against hand-written Flink programs attributed to pushdown and window incrementalisation techniques. .
The research behind Aqualang was supported by Vinnova (GrantNo.: 2022-03036), the Swedish Foundation of Strategic Research(Grant No.: BD15-0006), Wallenberg AI NEST (DataBound Computing) and the Swedish e-Science Research Centre (SeRC)