Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between the first (’00–’10) and second (’11–’23) generation of stream processing systems, and discuss future trends and open problems.
We thank the anonymous VLDBJ reviewers for their detailed and valuable feedback on prior drafts of this paper. This work was partially supported by a Google DAPA award, WASP NESTS (Data-Bound Computing), and the Dutch Research Council (NWO) Vidi project No. 19708.