KORD-I: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming

Athanasios Kordelas, Thanasis Spyrou, Spyros Voulgaris, Vasileios Megalooikonomou, Nikos Deligiannis

Research output: Chapter in Book/Report/Conference proceedingConference paper

Abstract

Apache Spark is one of the most commonly used
frameworks for Big Data processing. Research on the provided
streaming dynamic resource allocation feature, has been shown
that large data load fluctuations, for instance, in website traffic,
have a negative impact on the automatic scaling. Research has
also indicated that the lack of data load prediction, which
aims at the identification of the expected data load increase on
peak hours/days, is the root cause of the aforementioned issue.
Hence, this paper proposes an enhanced solution, namely, KORDI
(Knowledge-based Orchestrated Resource DIstribution), aiming
at optimising the allocation of Spark resources on Streaming
applications in real time with the use of SARIMAX model.
The experimental evaluation proves that the proposed solution
provides a cost reduction of 38% without affecting stability.
Original languageEnglish
Title of host publication2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-23)
Pages1-3
Number of pages3
Publication statusAccepted/In press - 2023
Event2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) - Raleigh, North Carolina, Raleigh, United States
Duration: 23 Apr 202325 Apr 2023
https://ispass.org/ispass2023/

Conference

Conference2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Country/TerritoryUnited States
CityRaleigh
Period23/04/2325/04/23
Internet address

Fingerprint

Dive into the research topics of 'KORD-I: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming'. Together they form a unique fingerprint.

Cite this