Abstract
In the contemporary landscape of big data analytics, cloud computing environments have emerged as pivotal platforms for data-wrangling processes, catering to the ingestion and transformation of vast datasets. This research paper explores optimization strategies for data wrangling within cloud computing environments, a critical component in the realm of big data analytics. It addresses the significant security and performance challenges encountered during data pipeline execution in cloud platforms. By proposing a novel strategy that includes executing data pipelines within a customer's Virtual Private Cloud (VPC) and employing pushdown optimization for data transformation tasks in cloud data warehouses and databases, this approach seeks to enhance security and performance. The paper examines the theoretical underpinnings and practical applications of these strategies, conducting a comparative analysis with traditional data-wrangling methods to underscore the benefits of performance and security. Additionally, it assesses the implications of this approach on cost, scalability, and manageability within cloud architectures. The findings offer valuable insights and recommendations for deploying these optimization techniques in practical scenarios, setting the stage for future research in refining data-wrangling practices in cloud environments.
Publisher
Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP