Thèse soutenue
Publié le 14 mai 2019 | Mis à jour le 16 février 2021

Quality of Service Aware Mechanisms for (Re)Configuring Data Stream Processing Applications on Highly Distributed Infrastructure

Alexandre DA SILVA VEITH

Thèse sous la direction de : Laurent Lefèvre, INRIA, LIP, ENS de Lyon
Discipline : Informatique

La thèse 

The increasing availability of sensors and Internet-connected devices has led to an explosion in the volume, variety and velocity of data generated that requires some kind of analysis. Under several application scenarios, such as in smart cities, monitoring information from large infrastructures, and Internet of Things, continuous data streams must be processed in nearly real time. Several frameworks have been proposed for data stream processing, many of which have been deployed in cloud environments, aiming to benefit from characteristics such as elasticity. Elasticity, when properly exploited, refers to the ability of a cloud to allow a service to allocate additional resources or release idle capacity on demand. Although early efforts have been made towards making stream-processing more elastic [1,2,3], many issues are still not addressed. Most stream processing services follow a dataflow approach. They are Directed Acyclic Graphs (DAGs) of so-called operators, which perform User Defined Functions (UDFs), and whose placement on available resources, identification of bottlenecks, and adoption can be difficult; especially when these services are part of a larger infrastructure that comprises other types of execution models. The research goals are to investigate architectures and resource management algorithms for attaining elastic and distributed data stream processing. At an architectural level, the goal is to design resource management models that can exploit resources from both the edges of the Internet and traditional cloud infrastructure. From a cloud provider's perspective, a goal is to investigate algorithms and mechanisms that provide elasticity to distributed stream processing services and other big-data applications that perform periodical analyses. Such algorithms aim to reduce fixed and variable costs such as with electricity whilst respecting the quality of service metrics. The thesis will work towards instrumenting existing data-stream processing frameworks that follow a data flow approach, or that use a discretised stream approach [4], build performance and energy consumption models for certain target applications and analytics solutions. Such models will be used for devising strategies for addressing elasticity of stream processing onto cloud infrastructure, establishing triggers for allocating or releasing resource capacity, and adapting services to resource availability and considering energy efficiency. References: [1] Sattler, Kai-Uwe, and Felix Beier. "Towards Elastic Stream Processing: Patterns and Infrastructure." BD3@ VLDB. 2013. [2] Gedik, Bugra, et al. "Elastic scaling for data stream processing." Parallel and Distributed Systems, IEEE Transactions on 25.6 (2014): 1447-1463. [3] Tran, Dang-Hoan, Mohamed Medhat Gaber, and Kai-Uwe Sattler. "Change detection in streaming data in the era of big data: models and issues." ACM SIGKDD Explorations Newsletter 16.1 (2014): 30-38. [4] Zaharia, Matei, et al. "Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters." Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing. 2012.