Challenges in Resource Provisioning for the Execution of Data Wrangling Work ows on the Cloud: A Case Study

Authors Abdullah Khalid A. Almasaud, Agresh Bharadwaj, Sandra Sampaio, Rizos Sakellariou
Title Challenges in Resource Provisioning for the Execution of Data Wrangling Work ows on the Cloud: A Case Study
Abstract Data Wrangling (DW) is an essential component of any big data analytics job, encompassing a large variety of complex operations to transform, integrate and clean sets of unrefined data. The inherent complexity and execution cost associated with DW workflows make the provisioning of resources from a cloud provider a sensible solution for executing these workflows in a reasonable amount of time. However, the lack of detailed profiles of the input data and the operations composing these workflows makes the selection of resources to run these workflows on the cloud a hard task due to the large search space to select appropriate resources, their interactions, dependencies, trade-offs and prices that need to be considered. In this paper, we investigate the complex problem of provisioning cloud resources to DW workflows, by carrying out a case study on a specic Traffic DW workflow from the Smart Cities domain. We carry out a number of simulations where we change resource provisioning, focusing on what may impact the execution of the DW workflow most. The insights obtained from our results suggest that fine-grained cloud resource provisioning based on workflow execution profile and input data properties has the potential to improve resource utilization and prevent signicant over- and under-provisioning.
ISBN 978-1-7281-0858-2
Conference The 31st International Conference on Database and Expert Systems Applications (DEXA 2020)
Date 14-17 September 2020
Location Bratislava, Slovakia
Url https://zenodo.org/record/4317723#.X9h2LC0RqZw
DOI https://doi.org/10.1007/978-3-030-59051-2_5