Cloud experts are increasingly being called upon to scrutinize their operational cloud expenses. To contribute to the community, I'd like to outline a high-level initial assessment that I typically employ when examining costs. In this article, I'll primarily focus on low-hanging fruit related to compute optimization, and in a future article touch on data transfer and storage. While most of the concepts are AWS-centric, similar principles apply to other hyperscale, making this knowledge transferable.
-
Tagging: Although tagging doesn't yield immediate benefits, it plays a crucial role in identifying value streams and their associated operational costs. By collecting this data, we gain insights into whether a service warrants reevaluation to deliver more value.
-
Sizing Optimizations: It's essential to assess whether the instances in use are the right size for the services they support. Workloads often evolve over time, and initial assessments may no longer align with reality. Revisiting workloads and evaluating instance resources can uncover opportunities to reduce instance sizes. For serverless architectures like Lambdas, increasing resources might enhance runtime performance and reduce error rates. In the case of databases, architectural changes such as the introduction of supporting services or caching may alter request volumes, rendering once-large instances underutilized.
-
Upgrade to ARM: ARM-based instance types can deliver comparable or superior performance at a lower cost. Upgrading to ARM is a straightforward decision if your service can operate effectively on this architecture.
-
Savings Plans and Reservations: Going deeper into this topic, it's vital to thoroughly check the coverage of savings plans and reservations. If you have resources that lack coverage under some form of reservation or savings plan, it's prudent to secure one. AWS, for instance, offers flexibility with one-year commitments and various upfront fee options.
Note: When optimizing and upgrading, it's crucial to verify whether you have ongoing reservations or savings plans. While changing instance types for optimization or transitioning to ARM may seem cost-effective, existing savings plans covering those instances could lead to underutilized reservations that you're still paying for. This underscores the importance of ongoing financial operations (FinOps), which involves continuously monitoring opportunities as they arise. In this context, the expiration of a reservation or savings plan could present a new opportunity.
This high-level overview primarily focuses on compute optimization. If you're interested in exploring other high-level strategies or delving into specific tools and techniques, let me know.