Is your cloud provider working today? How good is digital transformation if a cloud service outage cripples your application environment? These questions are on the minds of many tech executives as they wrestle with building a cloud-integrated technology stack to run their enterprises’ most critical processes.
Addressing the problem is not easy, however. Right now, the onus is on the customers to figure out how to make their systems resilient in the face of cloud-based service outages, as there are no formal multivendor cloud agreements in which one provider handles failover for another. As such, many businesses have explored replicating workloads in multiple clouds, which comes with both significant architectural problems as well as high costs. Faced with the expense and technical challenges of multi-cloud resilience, some users find that cloud regions seem to offer a firebreak for most service outages and therefore look at single-cloud, cross-region resilience options instead.
But in reality, could Azure work as a failover for Amazon Web Services (AWS) or Google Cloud Platform (GCP) or some other form of cross-cloud resilience at the provider layer?
I believe the architectural differences between the hyperscalers would limit this possibility technically. Certainly, some services that are compatible have pretty easy ways to deal with cross-cloud conversion, including S3 bucket replication and cloud-to-cloud VM transformation. Yet unlike the power grid — which has standardized on the voltages, phase, and current provided and provides clear-cut ways to transform power to make sure those electrical characteristics match — the cloud providers provide different services. Azure Functions is not the same as AWS Lambda. GCP Spanner is not the same as AWS RDS. When you are trying to translate applications at the code or database layer on the fly, too much can go wrong. If a few houses exploded every time the power grid had to pull power from another grid, people wouldn’t call that resilient.
You May Need Cloud-to-Cloud Failover, But Your Cloud Vendor Doesn’t
Beyond technical incompatibilities, it is not in the interest of the cloud hyperscalers to backstop service failures for other clouds, as such efforts may require investment and could enable diversion of user spending to rival vendors. For cloud providers, it is more advantageous to build out additional regions and add internal resilience tools like the AWS Resilience Hub. Features such as these address the blast radius of service failure and offer internal tooling for moving workloads to other regions. Without being compelled to do so, either by the market or by regulation, don’t expect the hyperscalers to act as Band-Aids for one another.
In real life, this problem is being solved on the client side through the use of VMware Cloud Foundation, Kubernetes, or some other form of cross-cloud abstraction. These technologies are allowing companies to manage their technology stacks across different clouds or local data centers and are providing some freedom of choice with regard to where they run workloads, both from a resilience perspective and a cost perspective. The expertise for doing this is thin, however, and every business is competing for such talent.
Take a Multipronged Approach to Achieve Multi-Cloud
Niche vendors are working to make multi-cloud management more doable alongside similar, although uneven, efforts by hyperscale cloud providers themselves. But there is no standard model for multi-cloud quite yet — and multi-cloud failover is more difficult still. While it may seem like a panacea to just have the various clouds converge, this could encourage monopolistic behavior and price fixing and is thus undesirable. Even so, users will continue to press cloud providers to offer at least a degree of service interoperability for core infrastructure.
Anyone who is holding back on building for multi-cloud failover until cloud providers step up may be waiting for a long time. Users who want such capabilities for their most critical applications are well advised to take the initiative themselves. In addition to the VMware Cloud Foundation and Kubernetes options previously described, users can pursue application development based on containerized applications with few or no dependencies on the underlying cloud. This approach is perhaps most suited for massively scalable web applications where uptime is critical but where there is limited interaction with internal data.
Such cloud-spanning applications are rare outside of software-as-a-service providers. The clouds simply don’t plug and play with one another, by design.
The power grid analogy is not entirely wrong, however. While power grids are interoperable and function without service disruption in industrialized countries, remember that when you travel internationally, you often need to have a power adapter. As such, portability and resilience become a personal problem. The same holds true in the public cloud market today.
What to Read Next:
Special Report: How Fragile is the Cloud, Really?
Lessons Learned from Recent Major Outages
Emerging Tech to Help Guard Against the Malevolence of Cloud Outages
Source link