Alongside this, developers and IT operations staff will have to look at where they run generative AI workloads. Many companies will start with this in the cloud, as they want to avoid the burden of running their own LLMs, but others will want to adopt their own approach to make the most of their choices and to avoid lock-in. However, whether you run on-premises or in the cloud, you will have to think about running across multiple locations.
Using multiple sites provides resiliency for a service; if one site becomes unavailable, then the service can still function. For on-premises sites, this can mean implementing failover and availability technologies around vector data sets, so that this data can be queried whenever needed. For cloud deployments, running in multiple locations is simpler, as you can use different cloud regions to host and replicate vector data. Using multiple sites also allows you to deliver responses from the site that is closest to the user, reducing latency, and makes it easier to support geographic data locations if you have to keep data located in a specific location or region for compliance purposes.
Ongoing operational overhead
Day two IT operations involve looking at your overheads and problems around running your infrastructure, and then either removing bottlenecks or optimizing your approach to solve them. Because generative AI applications involve huge volumes of data, and components and services that are integrated together, it’s important to consider operational overhead that will exist over time. As generative AI services become more popular, there may be issues that come up around how those integrations work at scale. If you find that you want to add more functionality or integrate more potential AI agents, then these integrations will need enterprise-grade support.