• Wed. Nov 13th, 2024

How Cloud Custodian conquered cloud resource management

Byadmin

Sep 18, 2024



Kapil Thangavelu: Like so many large enterprises eight years ago, they were aggressively moving to the cloud and open source, and the mandate was to accelerate all the developers getting into the cloud environment. Obviously being in financial services, we were dealing with a highly regulated industry — every new cloud service had to have its certs signed off, everything configured correctly in REST. There were a ton of one-off scripts, it was easy to configure things incorrectly and create backlogs of problems, and then you had the other challenges of making sure things were tested and monitored consistently. It was obvious that this was not going to scale across hundreds of of engineers and application teams. So we said, let’s create a DSL that can address these issues holistically across these dimensions. Let’s not just identify cloud problems, but figure out a language that would also let us fix them in real-time. We designed Cloud Custodian to be a highly readable YAML DSL. We wanted this language and policy definition for cloud resources to be accessible across many different groups, to developers, to their managers, and even to the auditors in secondary lines like security. And we wanted it to be highly readable, because in coding you’re always going to be reading much more than you write with cloud resources, so let’s make it as readable as possible.

Van: What would you say Cloud Custodian is known for today, in terms of the kinds of problems it solves?

Thangavelu: The initial focuses were tagging, compliance, security, but also doing workflows around cost stuff. Cloud Custodian gives you a workflow where you can define things like grace periods for cloud resources where they then shut off if unused — those types of constructs for building logical workflows around cloud resources, as policies. Even today, eight years after open sourcing the project in 2016, Cloud Custodian’s claim to fame is being best in class in remediation. It doesn’t just let you admire problems, it’s designed to help you solve the problems in your cloud footprint. The big areas where it thrives are things like garbage collection and dealing with under-utilized cloud resources, right-sizing resources that may be overprovisioned, handling the life cycle of objects and buckets and all the reclamation policies that go with that, and making sure configurations are in line with the desired policies, pre-deployment. Those are some of the big areas, but Cloud Custodian also has things like blast radius protection and other types of tooling to help deal with the risks of remediation in production, which is always tricky. 



Source link