Chapter 9. The FinOps Lifecycle

Back in Chapter 1, we discussed the core principles of FinOps. The principles are great to help guide actions, and in Chapter 7 we discussed the FinOps Framework, which was built on those principles. This chapter covers the ongoing, iterative phases of the FinOps lifecycle and how to apply them to the capabilities of FinOps. The principles haven’t changed much since the first edition of this book. But we have seen a lot of different companies and organizations begin to build and iterate on top of them, making them their own. Much like the FinOps Framework, which is a set of building blocks meant to be assembled in various ways relevant to your organization, the principles are also open source bedrocks for you to assemble and build a FinOps plan for your organization.

The Six Principles of FinOps

Again, the principles are as follows:

  • Teams need to collaborate.

  • Decisions are driven by the business value of cloud.

  • Everyone takes ownership of their cloud usage.

  • FinOps reports should be accessible and timely.

  • A centralized team drives FinOps.

  • Take advantage of the variable cost model of the cloud.

Let’s look at how each principle plays out in action with real-world ramifications, and later we’ll dig into how specific framework capabilities are designed to leverage them to achieve specific results.

#1: Teams Need to Collaborate

First and foremost, FinOps is a cultural change that focuses on breaking down the silos between teams that historically haven’t worked closely together. When done well, the finance team uses language and reporting that moves at the speed and granularity of cloud, product managers fine-tune their application scaling forecasts to accommodate expected income from new features, while engineering teams consider cost as a new efficiency metric.

At the same time, the FinOps team works to continuously improve agreed-upon metrics for efficiency. They help define governance and parameters for cloud usage that provide some control, but focus first on ensuring innovation and speed of delivery can flourish alongside cost efficiency.

The addition of a blameless culture to this principle enables the company to learn from mistakes. Taking away the need for a person/team to blame for a cost overrun allows a postmortem to focus instead on how an overrun can be avoided in the future and what changes the organization needs to make to learn from this incident.

If a culture of finger pointing and shaming for “doing” the wrong thing prevails, people will not bring issues to light for fear of punishment.

Betsy Beyer et al., Site Reliability Engineering (O’Reilly)

#2: Decisions Are Driven by the Business Value of Cloud

Think first about the business value of cloud spend, not the cost. It’s easy to think of cloud as a cost center, especially when the spend reaches material levels. The cloud is a value creator, but the more you use it, the more cost it will incur. The role of FinOps is to help maximize the value created by that spend. Instead of focusing on the cost per month, focus on the cost per business metric, and always make decisions with the business value in sight.

#3: Everyone Takes Ownership of Their Cloud Usage

Cloud costs are based on cloud use, which comes with a straightforward correlation: if you’re using the cloud, you are incurring costs and thus are accountable for cloud spending. Embrace this fact by pushing cloud spend accountability to the edges of your organization, all the way to individual engineers and their teams. And give them the information and guidance to be able to do this important job for the organization.

#4: FinOps Reports Should Be Accessible and Timely

In the world of per-second—or even microsecond—compute resources, unlimited cloud storage, shared Kubernetes clusters, automated deployments, and services that can incur costs based on externally controlled triggers, monthly or quarterly reporting of cloud spending isn’t good enough. Real-time decision making is about getting data—such as spend changes or anomaly alerts—quickly to the people who deploy and manage cloud resources.

As discussed in Chapter 8, FinOps data should be put in the path of the people making infrastructure decisions, informing them of the information they need without adding the extra effort for them to find it. Real-time decisions enable these people to create a fast feedback loop through which they can continuously improve their spending patterns, make intelligent decisions, and improve efficiency.

Focus relentlessly on clean data to drive decisions. FinOps decisions should be based on fully loaded and properly allocated costs. The costs should be amortized to include any prepayments made as part of commitment programs and should reflect the actual discounted rates a company is paying for cloud resources. They should also equitably factor in shared costs and be mapped to the business’s organizational structure. Without these adjustments to your spending data, your teams will make decisions based on incorrect data and hamstring value creation.

#5: A Centralized Team Drives FinOps

Cultural change works best with a flag bearer. A central FinOps function drives best practices into the organization through education, standardization, and evangelism. This centralized team is where you find the subject matter experts who make the changes to culture through advocacy and education. The FinOps team improves the available data via better tooling and modifies the business processes that enable your organization to do FinOps. You maximize the results from rate optimization efforts by centralizing them, which gives your teams on the edge the freedom to maximize the results from usage optimization. Remember, the most successful companies decentralize responsibility to use less, and centralize responsibility to pay less.

FinOps practitioners use performance benchmarking to provide context for how well their organization is performing. Cloud performance benchmarking gives a company objective evidence on how well it’s doing. Benchmarking lets teams know whether they’re spending the correct amount or whether they could be spending less, spending differently, or spending in a better way. Companies should use both internal benchmarks to determine how individual teams compare to each other in key areas such as optimization, and external benchmarks based on industry standards to compare the company as a whole to others like it.

#6: Take Advantage of the Variable Cost Model of the Cloud

In the decentralized world of the cloud, planning for capacity moves from a forward-looking “What are you going to need to cover demand?” perspective to a real-time “How can we stay within our budget given what we’re already using?” perspective. Instead of basing capacity purchases on possible future demand, base your rightsizing, volume discounts, and RI/SP/CUD purchases on your actual usage data. Since you can always purchase more capacity to fit demand, the emphasis becomes making the most out of the services and resources you’re currently using, and to use as few of them for as long as possible.

As your cloud practice matures, aim to take advantage of cloud native services that scale with demand and models such as spot instances that can leverage low-cost resources when they are needed.

The FinOps Lifecycle

Now that we’ve detailed the core principles, let’s explore how they’re implemented across three distinct phases: inform, optimize, and operate (see Figure 9-1). These phases aren’t linear—you should plan to cycle through them constantly:

  1. The inform phase gives you the visibility for allocation and for creating shared accountability by showing teams what they’re spending and why. This phase enables individuals who can now see the impact of their actions on the bill.

  2. The optimize phase empowers your teams to identify and measure efficiency optimizations, like rightsizing, storage access frequency, or improving RI coverage. Goals are set upon the identified optimizations, which align with each team’s area of focus.

  3. The operate phase defines and implements processes that achieve the goals of technology, finance, and business. Automation can be deployed to enable these processes to be performed in a reliable and repeatable manner.

The FinOps lifecycle
Figure 9-1. The FinOps lifecycle

The lifecycle is inherently a loop and is never complete. The most successful companies take an approach of gradual improvement and get a little better each time they go through it, building muscle memory through repetition and practice.

In each phase of the lifecycle you will take actions and perform activities based on the state of your cloud spend and your FinOps practice. Chapter 7 covered the FinOps Foundation Framework, which describes these aspects of the FinOps operating model. We described this using the metaphor of a garden where you will pivot to the tasks that need doing each day—watering, mulching, and weeding—based on the state in which you find the garden at that time.

The inform phase is where you will look at the state of your FinOps garden. The optimize phase is where you will look at which of the capabilities has the actions you might perform to make your FinOps garden healthier. And the operate phase is where you will take those actions in your organization’s environment to ensure your garden flourishes.

Some of the capabilities lend themselves well to a certain phase of the lifecycle, and others might be used in more than one or all the phases.

Let’s review each phase and some of the actions you’ll take as you loop through the lifecycle. It’s important to note that you won’t perform all actions during every pass through the lifecycle. Chapter 1 covered the Prius Effect, which translates real-time feedback loops into data-driven decision making. Similarly, each pass through the lifecycle should be informed by the latest data you have to focus your efforts on the smallest set of most important activities needed at that time.

Inform

The inform phase is where you start to understand your costs and the drivers behind them. By giving teams visibility into their costs on a near-real-time basis, you drive better behavior. During the inform phase, you get visibility into cloud spend, drill down into granular cost allocation, and create shared accountability. Teams learn what they’re spending on what services, and why, by using various benchmarks and reporting. For the first time, individuals can see the impact of their actions on the bill.

Some of the activities in this phase include:

Map spending data to the business
Before you can implement accurate chargeback, you must properly map spend data to the organizational hierarchy by cost centers, applications, and business units. Tags and accounts set up by engineering teams are often not aligned to the view of the world that finance teams need, nor do they include the proper roll-ups that executives require.
Create showback (and other) reporting
To push spend accountability to the edges of the organization, you must show each group what they are spending via a showback or similar model like chargeback. This visibility is the essential building block of driving ownership of spending and ultimately deriving the most value from it.
Define budgets and forecasts

The FinOps team should provide the data needed for teams to generate forecasts of cloud usage for different projects and propose budgets for each. These budgets and forecasts should consider all aspects of a cloud architecture, including cloud native services, containers, and related costs. Managing teams to budgets lets you know when to lean in with optimization or spend remediation help. It also enables a conversation about why spending has changed.

Forecasting of spend should be done for each team, service, or workload based on fully loaded costs and properly allocated spending, with the ability to model changes to the forecast based on different inputs such as history and cost basis.

Define your account strategy
The way your organization distributes and uses AWS accounts, Google Cloud projects, Azure subscriptions/resource groups, or other hierarchical groupings will have a large impact on how you can do cost allocation, one of the critical capabilities used in the inform phase. A lot of the heavy lifting of cost allocation can be accomplished by having a consistent and logical way of distributing accounts so they can be used as boundaries for cost, as they are for security, resources, and other technical purposes.
Set tag strategy and compliance
Your metadata strategy (tagging and labeling), covered in more detail later, is both an art and a science. Even with a strong account hierarchy, it’s critical to get early alignment on a supporting tag strategy to get more granular. Without this, tag definition is left to the masses (or left out), and tag sprawl quickly makes any tags unusable.
Identify untagged (and untaggable) resources
There are two types of organizations: those who have untagged resources and those who have fooled themselves into thinking they do not. Assigning untagged resources to teams or workloads—and applying a meta layer of allocation to untaggable ones—is critical to proper chargeback, visibility, and later optimization.
Allocate shared costs equitably
Shared costs like support and shared services should be allocated at the appropriate ratio to responsible parties. There are a few methods of doing this, including sharing them equally or assigning them based on a usage metric like spend or compute hours. Leaving them in a central cost bucket is generally less desirable, as teams then don’t see the true cost of their applications.
Dynamically calculate custom rates and amortizations
Accurate spend visibility requires that companies factor in any custom negotiated rates, that discounts from RIs/SPs/CUDs are applied, and that amortized prepayments from RIs/SPs/CUDs are applied. This ensures teams are tracking the right spend numbers and aren’t surprised if their finance team’s bill doesn’t match their daily spend reporting.
Analyze trending and variance
Identifying spend drivers often requires ad hoc comparisons of time periods and the ability to report at a high level (e.g., cost center) all the way down to resources (or containers, functions, etc.) to understand cost drivers.
Create scorecards
Scorecards let the FinOps team benchmark how different project teams are doing in terms of optimizing cost, speed, and quality. They’re a quick way of looking for areas that can be improved, which should be done using the fully loaded and properly allocated cost mentioned previously.
Benchmark against industry peers
Building on the concept of internal scorecards, more advanced FinOps teams extend their benchmarking to make comparisons with other industry peer-level spend data to identify their relative efficiency using a normalized set of spend characteristics.
Identify anomalies
Anomaly detection isn’t just about identifying expense thresholds—it’s also important to identify unusual spikes in usage. Given the dramatic rise in the variety of variably charged services available from cloud providers, anomaly detection that watches for any deviations in spend helps you find the needle in the haystack that may need quick remediation.

Optimize

The optimize phase identifies measured improvements to your cloud and sets goals for the upcoming operate phase. Cost-avoidance and cost-optimization targets come into play during this phase, with cost avoidance being the first priority.

Processes are required to set and track the near-real-time business decisions that enable your organization to optimize its cloud. We’ll also look at the cloud service provider’s offerings that can help to reduce cloud costs. This phase includes the following activities:

Analyze KPIs and set goals
It is during the optimize phase that you are looking at your KPIs and setting goals for how to achieve them incrementally. By understanding what you are trying to achieve overall, you can set interim step goals to get there over time. This is helpful during the optimize phase because it helps to scope the opportunities you find and document for action in the operate phase later.
Find and report on underutilized services
Once teams can see their properly allocated spend and usage, they can start to identify unused resources across all major drivers of spend (e.g., compute, database, storage, or networking). You can measure potential savings based on generated recommendations to eliminate things that aren’t being used, to scale things that are used cyclically, to rightsize things that are chronically the wrong size, or to rearchitect things that are operating poorly.
Evaluate centralized commitment-based discount options
As a cost-reduction measure, the FinOps team can evaluate metrics on existing AWS/Azure RIs, AWS SPs, or Google Cloud CUDs/Flexible CUDs to make sure the ones they have are being used effectively and then look for opportunities to buy more (or sell or modify existing ones). They track commitments and reservations, analyzing the entire portfolio across the enterprise to account for and understand the efficiency of usage and cost-avoidance actuals, complete with visibility into upcoming expirations.

Operate

Whereas the optimize phase sets the goals for improving, the operate phase sets up the processes for taking actions to achieve those goals. This phase also stresses continuous improvement of processes. Once automations are in place, management takes a step back to ensure spending levels are aligned with company goals. It’s a good time to discuss particular projects with other FinOps team members to determine whether the team should make some changes. Here are some of the activities that take place during the operate phase:

Deliver spend data to stakeholders
Creating the Prius Effect discussed in Chapter 1 requires stakeholders to regularly see how they’re tracking against their budgets. Daily or weekly visibility gives them a feedback loop that enables them to make the right decisions for the business. During the operate phase you focus on how these reports are delivered to stakeholders, building out the processes and automation to generate the reports and make them available.
Make cultural changes to align with goals
Teams are educated and empowered to understand, account for, and partner with other organizational teams to drive innovation. Finance teams are empowered to be bold change agents who move out of blocking investment and into partnering with the business/tech teams to encourage innovation. Each team must constantly and iteratively improve their understanding of the cloud and level up their reporting efficiency.
Rightsize instances and services (or turn them off)
During the optimize phase, you might discover that you’re paying for more powerful compute resources than you need. Recommendations that have been generated are acted upon during the operate phase. Engineers review the recommendations and, where appropriate, make adjustments—for example, switching to less powerful, less expensive instances; replacing unused storage with smaller sizes; or using different tiers of storage that better match access needs. Mature FinOps teams do this across all major drivers of spend.
Define governance and controls for cloud usage
Remember that the primary value of the cloud is speed of delivery, fueling greater innovation. At the same time, cost must be considered, so mature companies are constantly evaluating their agreed-upon guardrails on how and what types of cloud services can be consumed to ensure that they aren’t hampering innovation and velocity. Overdo control, and you lose the core benefits of moving to the cloud.
Continuously improve efficiency and innovation
These are ongoing, iterative processes of refining targets and goals to drive better business outcomes. We call it metric-driven cost optimization as discussed in Chapter 22. Instead of using a cadence for optimization actions (which are prone to inefficiency or human neglect), metric-driven cost optimization defines key metrics with target thresholds attached and monitored to drive future optimization actions.
Automate resource optimization
Mature teams move toward programmatic detection of changes needed for incorrectly sized resources and offer the ability to automatically clean up underutilized ones.
Integrate recommendations into workflows
Mature teams stop requiring application owners to log in to see recommendations and begin pushing items like architecture recommendations, rightsizing recommendations, modernization opportunities, and other opportunities to improve into sprint planning tools. Some of this information will be tactical cost-optimization data, but some may be strategic in nature, helping teams to understand how architecting or building their software can begin to build cost efficiency in ways that avoid costs.
Integrate chargeback into internal systems
Once chargeback has been implemented and visibility given to teams, mature FinOps teams then integrate that data programmatically into their relevant internal reporting systems and financial management tools via their application program interface (API).
Establish policy-driven tag cleanup and storage lifecycle policies
Mature teams begin to programmatically clean up tags through policies like tag-or-terminate or tag-on-deploy. They also implement policy-driven storage lifecycles to ensure data is stored in the most cost-effective tier automatically.

Considerations

There are a few key considerations you should review in your FinOps practice when beginning your journey through the lifecycle. They fall along the key ideas of FinOps: having a clear understanding of your spend, creating a company-wide movement, driving innovation, and, ultimately, helping the business reach its goals.

You will want to evaluate the following:

Unit economics

An important step is to tie cloud spend to actual business outcomes. If your business is growing and you’re scaling in the cloud, it’s not necessarily a bad thing that you’re spending more money. This is especially true if you know what the cost is to service a customer and you’re continuously driving it down. Tying spend metrics to business metrics is a key step in your FinOps journey.

Unit economics provide a clear, common lexicon so that all levels of the organization can discuss cloud spending in a meaningful way. Instead of management setting arbitrary spend goals, it can set targets that are tied to outcomes. The management advice becomes “Don’t worry about the total bill; just make sure you’re driving down the cost per ride” instead of the more restrictive “Spend less on cloud.”

Culture
The operate phase is a good time to evaluate how well FinOps culture is being adopted. Problems such as inefficient utilization of resources or inadequate RI coverage are often due to poor communication and siloed organizations. Look for signs that teams are being proactive, designing cloud services that use the cloud efficiently. Assess if the available FinOps training is being completed by staff and that communication between finance and engineering is flowing smoothly.
Speed of delivery
Speed of delivery is controlled by the trade-off between cost and quality. Management might want to discuss particular projects with FinOps team members to decide whether they want to adjust those two levers to see if they can increase the speed of delivery.
Value to the business
Again, management may want to evaluate whether cloud spend reflects the value of the project to the business. This is another opportunity to discuss particular projects with FinOps team members to decide if they want to continue to operate them as they have been or if they can make some changes.

Where Do You Start?

You start by asking questions that kick-start the inform phase. Think of the FinOps lifecycle as a closed-loop racetrack—you can jump in at any point, and you’ll eventually loop back around. However, we recommend you start at inform before you get into optimize or operate. Gain visibility into what’s happening in your cloud environment and do the hard—but important—work of cleaning up your allocation so that you know who is truly responsible for what before you start making changes.

And no matter where you are in the lifecycle, you should be continually focused on culture and governance. The true power of FinOps comes from combining the actions and tools with cultural shifts that change how your whole organization relates to using the cloud.

Whatever you do, don’t try to boil the ocean. A few years ago, we saw a major retailer try to go from 0% to 80% RI coverage in a single purchase. The company studied its infrastructure, consulted its engineering teams, checked its operating systems, and made a $2 million purchase. Managers high-fived each other on their awesomeness and then went back to work for the next few weeks. The next month the cloud bill was considerably higher, and the VP was furious. Upon review, the company found it had purchased the wrong RIs in the wrong operating system due to naiveté about how BYOL (“bring your own license”) models are applied. That same retailer is now at 80% coverage, but it took a multiyear effort to uplevel finance teams and business units who were gun-shy after the earlier disaster. Take your time. Like anything, building competence takes time but can be accelerated by learning from those who have gone before you.

You Don’t Have to Find All the Answers

Before you start telling teams to turn off this resource or downsize that one, you must get a true sense of what the cost drivers are and let the teams see the impact of their spending on the business.

This will drive some surprising, autonomous results. We learned about a great example of this via a Slack message from a team member, where a manufacturing company enabled six-figure-a-year savings simply by showing a team what they were spending (see Figure 9-2).

A real conversation about the results of cost visibility
Figure 9-2. A real conversation about the results of cost visibility

The best part of this story is that FinOps didn’t make any recommendations to the team. All they did was shine a light on the team’s cloud usage. The team took charge to make improvements based on their understanding of the infrastructure cost. This is why you push reduction of usage out to the teams responsible for spending.

Conclusion

Remember, mastery of the FinOps lifecycle is an iterative approach requiring years of education and process improvement in a large enterprise.

To summarize:

  • The FinOps lifecycle comprises three main phases that you continuously cycle through.

  • Improve with each iteration of the lifecycle—don’t try to do everything at once.

  • Involve all your cross-functional teams early and often so they can learn with you.

  • Constantly look for opportunities to refine your processes, but move quickly from phase to phase.

  • The most critical thing you can do is provide your teams with granular, real-time visibility into their spending.

  • Before you can do anything else, you need to fully load and allocate your costs, factoring in your custom rates, filling allocation gaps, distributing shared costs, remapping the spend to your organizational structure, and accounting for amortizations.

This may sound like a lot of work, but it’s actually an easy process to get started. Next up, we’ll work through the first phase of the lifecycle so you can start addressing the questions you need to answer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset