• Sun. Sep 22nd, 2024

Developing agile ETL flows with Ballerina

Byadmin

Aug 26, 2024



sheets:Spreadsheet sheet = check spreadsheetClient->createSpreadsheet(sheetName);
_ = check spreadsheetClient->
   appendValue(sheet.spreadsheetId, [“Product”, “Sales”, “Date”], {sheetName: workSheetName});
foreach var {product, sales, date} in salesSummary {
   _ = check spreadsheetClient->
       appendValue(sheet.spreadsheetId, [product, sales, date], {sheetName: workSheetName});
}

Deploying and testing ETL flows

Developing individual ETL tasks as microservices allows the entire ETL flow to be deployed in a Kubernetes cluster. Each ETL task can be a pod in the Kubernetes deployment, making it possible to increase or decrease the number of pods of individual ETL tasks based on the load. However, organizations usually have multiple ETL flows, each with many tasks. Furthermore, these ETL flows can be owned by different teams. Therefore, it is crucial to have proper CI/CD pipelines, permission models, monitoring capabilities, and multiple environments for development, testing, performance validations, and production.

Ballerina can work with all common CI/CD, monitoring, and deployment technologies, making it seamless to integrate Ballerina-based ETL flows with an organization’s existing infrastructure. For example, Ballerina ETL source code can be maintained in GitHub, CI/CD actions can be implemented using Jenkins, ETL flows can be deployed on Amazon EKS, and the executions can be monitored using Prometheus and Grafana.



Source link