-
Recherche
- Présentation
- Trimestres et mois thématiques
- Evénements scientifiques
-
Chercheurs invités
- Chercheurs invités 2021
- Chercheurs invités 2022
- Chercheurs invités 2023
-
Post-doctorants Milyon
- Post-doctorants 2022
- Post-doctorants 2023
- Publications
- Prix, honneurs, bourses de recherche
- Portraits de chercheurs
- Formation
- Médiation
- Entreprise
-
Appels à candidature
- Appels en cours
- Étudiants : Bourses d'excellence
- Doctorants : contrats doctoraux
- Post-doctorants : contrats post-doctoraux
- Chercheurs : Aide à la mobilité
- Contacts
Vous êtes ici : Version française > Présentation > Thèses
-
Partager cette page
Thèse soutenue
Publié le 14 mai 2019 | Mis à jour le 16 février 2021
Quality of Service Aware Mechanisms for (Re)Configuring Data Stream Processing Applications on Highly Distributed Infrastructure
Alexandre DA SILVA VEITH
Thèse sous la direction de : Laurent Lefèvre, INRIA, LIP, ENS de Lyon
Discipline : Informatique
La thèse
The increasing availability of sensors and Internet-connected devices has led to an explosion in the volume, variety and velocity of data generated that requires some kind of analysis. Under several application scenarios, such as in smart cities, monitoring information from large infrastructures, and Internet of Things, continuous data streams must be processed in nearly real time. Several frameworks have been proposed for data stream processing, many of which have been deployed in cloud environments, aiming to benefit from characteristics such as elasticity. Elasticity, when properly exploited, refers to the ability of a cloud to allow a service to allocate additional resources or release idle capacity on demand. Although early efforts have been made towards making stream-processing more elastic [1,2,3], many issues are still not addressed. Most stream processing services follow a dataflow approach. They are Directed Acyclic Graphs (DAGs) of so-called operators, which perform User Defined Functions (UDFs), and whose placement on available resources, identification of bottlenecks, and adoption can be difficult; especially when these services are part of a larger infrastructure that comprises other types of execution models. The research goals are to investigate architectures and resource management algorithms for attaining elastic and distributed data stream processing. At an architectural level, the goal is to design resource management models that can exploit resources from both the edges of the Internet and traditional cloud infrastructure. From a cloud provider's perspective, a goal is to investigate algorithms and mechanisms that provide elasticity to distributed stream processing services and other big-data applications that perform periodical analyses. Such algorithms aim to reduce fixed and variable costs such as with electricity whilst respecting the quality of service metrics. The thesis will work towards instrumenting existing data-stream processing frameworks that follow a data flow approach, or that use a discretised stream approach [4], build performance and energy consumption models for certain target applications and analytics solutions. Such models will be used for devising strategies for addressing elasticity of stream processing onto cloud infrastructure, establishing triggers for allocating or releasing resource capacity, and adapting services to resource availability and considering energy efficiency. References: [1] Sattler, Kai-Uwe, and Felix Beier. "Towards Elastic Stream Processing: Patterns and Infrastructure." BD3@ VLDB. 2013. [2] Gedik, Bugra, et al. "Elastic scaling for data stream processing." Parallel and Distributed Systems, IEEE Transactions on 25.6 (2014): 1447-1463. [3] Tran, Dang-Hoan, Mohamed Medhat Gaber, and Kai-Uwe Sattler. "Change detection in streaming data in the era of big data: models and issues." ACM SIGKDD Explorations Newsletter 16.1 (2014): 30-38. [4] Zaharia, Matei, et al. "Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters." Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing. 2012.
The increasing availability of sensors and Internet-connected devices has led to an explosion in the volume, variety and velocity of data generated that requires some kind of analysis. Under several application scenarios, such as in smart cities, monitoring information from large infrastructures, and Internet of Things, continuous data streams must be processed in nearly real time. Several frameworks have been proposed for data stream processing, many of which have been deployed in cloud environments, aiming to benefit from characteristics such as elasticity. Elasticity, when properly exploited, refers to the ability of a cloud to allow a service to allocate additional resources or release idle capacity on demand. Although early efforts have been made towards making stream-processing more elastic [1,2,3], many issues are still not addressed. Most stream processing services follow a dataflow approach. They are Directed Acyclic Graphs (DAGs) of so-called operators, which perform User Defined Functions (UDFs), and whose placement on available resources, identification of bottlenecks, and adoption can be difficult; especially when these services are part of a larger infrastructure that comprises other types of execution models. The research goals are to investigate architectures and resource management algorithms for attaining elastic and distributed data stream processing. At an architectural level, the goal is to design resource management models that can exploit resources from both the edges of the Internet and traditional cloud infrastructure. From a cloud provider's perspective, a goal is to investigate algorithms and mechanisms that provide elasticity to distributed stream processing services and other big-data applications that perform periodical analyses. Such algorithms aim to reduce fixed and variable costs such as with electricity whilst respecting the quality of service metrics. The thesis will work towards instrumenting existing data-stream processing frameworks that follow a data flow approach, or that use a discretised stream approach [4], build performance and energy consumption models for certain target applications and analytics solutions. Such models will be used for devising strategies for addressing elasticity of stream processing onto cloud infrastructure, establishing triggers for allocating or releasing resource capacity, and adapting services to resource availability and considering energy efficiency. References: [1] Sattler, Kai-Uwe, and Felix Beier. "Towards Elastic Stream Processing: Patterns and Infrastructure." BD3@ VLDB. 2013. [2] Gedik, Bugra, et al. "Elastic scaling for data stream processing." Parallel and Distributed Systems, IEEE Transactions on 25.6 (2014): 1447-1463. [3] Tran, Dang-Hoan, Mohamed Medhat Gaber, and Kai-Uwe Sattler. "Change detection in streaming data in the era of big data: models and issues." ACM SIGKDD Explorations Newsletter 16.1 (2014): 30-38. [4] Zaharia, Matei, et al. "Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters." Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing. 2012.
Mots clés
Elasticity - Big data analytics - Stream Processing