Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. Inform Phase: Where Are You Right Now?

Begin your FinOps practice by asking questions. That’s what the inform phase is all about. As you find answers to those questions, you can start evaluating the state of your cloud. In this chapter, we’ll look at some of the questions you should start with, and we’ll get an aspirational glimpse of what great looks like across some common capabilities. All of that will help you know where to focus when you get to the optimize phase.

As we’ve stated, FinOps isn’t a linear process; it’s a cycle. The visibility you get in the first phase is essential to equip you to enter the next phase. You’ll spend the majority of your time in the inform phase. It’s easy to make costly mistakes if you jump to optimize too quickly. As carpenters have reminded us since the craft of building began, “measure twice, cut once.” Measuring informs you so you can make changes, and then you measure those changes against your operational metrics, and so on. You’ll always circle back around to inform them to check in again on which actions will be beneficial to the business.

Tip

The first question you should try to answer is: “What’s the total cloud spend, its forecast, and its growth rate?” If those numbers are immaterial to the business, it may be time to consider pausing after the most basic reporting is in place. Recall the informed ignoring concept covered in Chapter 2.

Data Is Meaningless Without Context

Of course, finding the data isn’t enough. You have to interpret it. Your goal is to educate yourself. You need to start conversations with your colleagues in finance, engineering, and the line of business (LoB) about what you want to accomplish.

To do this, skilled FinOps practitioners will ask questions that will set them up to build the appropriate allocation constructs, as discussed in Chapter 11. Remember, you’re not trying to boil the ocean. You want to identify questions that will be answered throughout the FinOps lifecycle.

This process also refines your common language that we discussed in Chapter 4. It prevents a lack of trust in the data, as well as the finger-pointing that often follows when there isn’t alignment between teams.

Any improvements made anywhere besides the bottleneck are an illusion.

Gene Kim, The Phoenix Project

Even the most experienced FinOps practitioners—those who have built multiple mature practices—follow a gradual maturing of FinOps process when setting up a practice. Your initial goal is to get some easy wins, harvest the low-hanging fruit, build muscle memory, and gain credibility. Think of DevOps, not Waterfall. Think of Gene Kim, not Winston W. Royce.

Seek First to Understand

As you start this phase, you may immediately find some waste. It’s understandable and natural to want to run out and fix it. However, ask first, “Where is the waste coming from?” This will help you to identify the root cause and prevent it from happening next time. In other words, you should resist the temptation to jump straight to making changes in your cloud estate and instead focus first on answering the questions.

One of the most effective ways to determine the questions you need to ask is to interview the various stakeholders in the organization. When you understand what they’re concerned about, you can use that knowledge to build the context needed for your reporting and allocation constructs.

Here is a set of questions to help you get started:

What do you want to report on? Is it cost centers, applications, products, business units, or something else?
Where is the bulk of spending coming from, i.e., from which set of services?
Are you going to do chargeback? Are you going to do showback? There are benefits in each, but either way helps to drive accountability.
If you’re going to centralize rate management, is the common good a priority, or is it more important to recoup costs by chargeback?
Is seeing spending trends enough, or do you need to accurately charge back to the penny? Trends are the goal in the beginning; later, granular allocation becomes important.
How will you account for changes in cost centers? Early on, you may have a spreadsheet or a configuration management database (CMDB), but later you will have a meta layer of allocation data in your FinOps platform to dynamically remap against your organization’s ever-changing structure.
How will you account for people or applications shifting between teams, or recombining into different teams? After they switch, how will you get the information that they’ll now care about, since their slice of the data will be different?
How will you notify people that there have been changes to the allocation constructs?
What are the tags you really need? Early on it may be only three, but later there may be dozens. However, if you do expand to dozens, it’s likely you have too many tags, and a future loop through the lifecycle might see you optimizing the number of tags based on the updated goals set in the optimize phase.
Will you do things like “lunch and learns” from your CCoE to regularly present the best practices and get people excited?

As you move around the FinOps loop, you answer more questions. Are you efficient? You’ll need to go through the loop to find out. And after a few times around, you’ll realize that you never actually arrive at the end. You just get better, and your questions get deeper:

Which teams are driving the costs? Are they being efficient? Can you link their costs to unit metrics?
Do you have budgets in place for each team? Are you managing teams to those budgets? Are you able to do activity-based costing?
What’s the state of your tagging strategy? Look at what tags are in place and what they are reporting. Look at coverage gaps. Consider how to allocate untaggable or shared costs.
How will you keep your allocation constructs in sync and up to date? A spreadsheet? Cadence-based syncing with your CMDB? API integration between your FinOps platform and your CMDB?
What is the commitment-based discount strategy? What commitment-based discount programs are there? How well are they being used? How could they be optimized? Which new ones should you make?
And then...rinse and repeat, repeatedly.

You pick the low-hanging fruit in each phase and then move on to the next. Then you readjust your goals and go back around. But you should always start with cost visibility. Then in the next cycle, you set basic budgets, discussed in Chapter 13. And the cycle after that, you’re managing them.

Each time you ask harder questions, always making sure that the other teams are brought along with you. In fact, their education is arguably more important than the FinOps practitioner’s skills. The winning FinOps culture is a village that works together, not a roomful of individuals who are trying to speed through some sweeping changes.

Organizational Work During This Phase

Beyond the work of spend data analysis, there’s much organizational and cultural work to be done to create a FinOps culture. During this phase of a healthy FinOps practice, you will also focus on:

Getting executives aligned around goals and the changes to your working models that cloud brings
Helping engineers understand their expanded role in affecting the business, particularly around cost as a new efficiency metric to consider
Ensuring the right skills are present on your team (FinOps.org has more detail)
Bridge-building with colleagues in engineering, finance, IT, architecture teams, procurement, product teams, security, ITAM groups, and the like, evangelizing FinOps work internally via events like FinOps day, to help share the concepts, impact, and early wins
Aligning with engineering teams (and helping to take work off their plates) by managing as much of the rate optimization as possible while ensuring they have the data needed to manage usage optimization

Transparency and the Feedback Loop

In Chapter 1, we explored the Prius Effect and the impact of real-time visibility on your behavior. In an electric car, the flow-of-energy display enables you to see how the choice you’re making in the moment—one that in the past may have been unconscious—is impacting the amount of energy you’re using. Likewise, in the inform phase, you rely on constantly arriving data to drive real-time decision making and build accountability.

Recently, we were asked, “Is near-real-time data needed for everything in the cloud?” We considered this question but couldn’t think of a single example of a report that should be looked at only once a month. Things move far too quickly in the cloud, which is due less to the computers themselves and more to the human behavior that’s driving cloud innovation. During early FinOps maturity, reports are viewed daily or weekly. As FinOps matures they’re viewed on an agreed-upon cadence (or based on anomalies), and in very advanced FinOps practices they’re viewed after an alert showing that a metric you’ve set has crossed the threshold. Chapter 22 digs into what that advanced stage scenario looks like.

Tip

Spending great amounts of time and/or money on making your data more real time than it needs to be for the processes you have inside your organization can be a source of frustration for new FinOps practitioners. Aim for frequently updated cloud spend data, but don’t go beyond what benefits your organization. Again, don’t try to boil the ocean at the risk of optimizing at the bottleneck.

A critical capability in this phase is the ability to detect anomalies in cloud spend. Anomalies are spendings that deviate from a typical value (average), a slope (trend) over time, or a cyclical repeating pattern (seasonality). They are the proverbial needle in the haystack that can be hard to detect among the complexity (and size) of most cloud bills. But they can also really add up.

Stories from the Cloud—FinOps Community

A remote engineering team at a multinational pharmaceutical company spun up three x1e.32xlarge instances in Sydney for testing of in-memory databases. At the time, an instance of this size cost just over $44 per hour. The three instances together cost over $3,000 per day, or around $98,000 per month. These seem like big numbers until you consider that the team’s monthly cloud bill was over $3,500,000. So this change would have resulted in a paltry 2% increase in spend and wouldn’t have been easily visible in high-level reporting.

Further obscuring this spend anomaly, the central team had just purchased RIs for another set of machines, a transaction that effectively canceled out the spend delta of the new X1e instances. However, because the FinOps team had machine learning–based anomaly detection, they found out about the use of the large instances the same day and were able to have an immediate conversation about whether or not so much horsepower was needed. Unsurprisingly, it turned out that it was not.

Granted, this is a story of a more mature stage company. A less mature FinOps practice typically starts with simple daily spend visibility that shows teams their respective spend. Even that amount of visibility still begins to influence their behavior accordingly.

Luckily, because identifying anomalous spending is such a critical issue, all cloud providers and many third-party tools and platforms provide you with the ability to spot anomalies more easily. To manage anomalies effectively, you must:

Have a solid cost allocation strategy—covered in the next chapter.
Frequently look at—or programmatically triage—the anomaly reporting.
Assign anomalies to the appropriate team responsible for the spend.
Follow a process for investigating and closing out anomaly tickets.

It is unfortunate how frequently anomalies cost organizations huge amounts of money that could have been greatly reduced if they simply paid attention to the anomaly alerts or knew who to talk to after the anomaly was detected. Managing anomalies starts in the inform phase. If you are not looking at your anomaly reporting, stop reading and go turn it on, then come back and finish this chapter. It’s that important.

Benchmarking Team Performance

Using scorecards that benchmark team performance is the best way to find out how teams compare. Scorecards allow you to find the least-performing and highest-spending teams and also give you insight into how you benchmark against others in the industry.

Scorecards should also show you opportunities for improvement against core efficiency metrics. They should give executives a view of the entire business (or a portion of it) to see how everyone is doing and be able to drive down to an individual team level for actionable data. They are the CxO’s best friend and best weapon to effect change. Scorecards should drive efforts by the teams and unify the experience between disparate teams who are working on similar efforts. And scorecards help teams compete against each other.

In a previous FinOps Foundation call, Intuit’s Dieter Matzion, now at Roku, shared his approach to scorecarding. The key items were:

EC2 efficiency via a rightsizing score
Rate commitment program coverage and efficiency
Elasticity measure to determine how well teams were taking advantage of the ability of cloud to bring resources up and down, based on workload

In addition to giving each team individual scorecards, tracking their efficiency across multiple metrics, Dieter also created an executive-level view that rolled up the scores on a team-by-team basis. This type of visibility shined a bright light on areas of opportunity. And it drove improvement, showing again that no one wants to be on the worst offender list.

Tip

A recording of Dieter’s presentation, in which he drills into the specific metrics used to benchmark teams, is available on the FinOps Foundation website.

What Great Looks Like

As in all disciplines, there is no shortcut to becoming a high performer, so take the following measures of greatness with a grain of aspirational salt. Organizational FinOps muscle will develop only over months and years of practice. Of course, you can do things to help speed up the process, but ultimately you need to put in the time required for learning and maturing the cultural changes within an organization. In fact, you can do your business a disservice by trying to go straight to high performance. It inevitably results in mistakes, which bring about unexpected costs.

Table 10-1 shows various levels of proficiency across key metrics. Quantitative data was taken from the more than $9 billion of public cloud spend in Apptio Cloudability’s dataset in 2019, while qualitative data was taken from a survey of hundreds of cloud consumers by the 451 Group.¹

Table 10-1. Metrics of low, medium, and high performers in public cloud
	Low performers	Medium performers	High performers
Visibility and allocation of cloud spend	Reliant on vendor invoices and manual reconciliation	>1 day for partial visibility with limited retention of granular historical data	<1 hour or near-real-time visibility of all spend with all current and historical data retained
Showback or chargeback	Inability to provide teams an accurate accounting of cloud spend	Cloud spend is allocated to teams based on estimated usage of resources	Teams understand their portion of cloud spend based on actual consumption
Team budgets	Teams have no budgets	Teams have budgets	Teams budget and track spend against budgets
RI and CUD management	0%–20% of cloud services purchased via reservations	40%–50% of cloud services purchased via reservations	80%–100% of cloud services purchased via reservations
Find and remove underutilized services	Every few months	Weekly	Automated and policy driven
Unit economics	May not use	More technical, team-based cost per compute hour, etc.	More business-focused cost per customer, etc.

High performers are able to ask and answer complex questions about their spend and forecasts quickly, they do what-if analyses on scenarios on changing deployments, and they understand the impact of those changes on their unit economics. Due to the granular visibility and allocation of cloud spend, they know where to look for opportunities to optimize their operational metrics. Or they can identify gaps in their information, or automation to create to help them mature in their inform phase.

This high level of capability enables businesses to be more competitive through speed of innovation, gives management and teams a better understanding of costs and COGS, and enables more insight into pricing of services.

Conclusion

The natural resting state of the FinOps lifecycle is the inform phase. Here, you understand the current state of your cloud financial management, which will help you identify opportunities to perform optimizations and operational improvement in the next phases.

To summarize:

Build context around your financial data in order to answer questions.
Use data to monitor spend and plan for optimizations and efficiency.
FinOps culture helps personas across disciplines to work together to answer important questions about usage and cost.
Following the FinOps maturity model is important. Each time you pass through the inform phase, you will be able to answer even more complex questions about your cloud as you mature.

We’ve identified the questions, but without the ability to divide cloud spend into more granular cost groups, answers about your overall spending are only so useful. You need a method of identifying which costs belong to which group to perform team-by-team reporting, showback, budgets, and forecasts. Then you can more effectively benchmark teams and compare them to each other. This is the subject of the next chapter, in which we introduce the concepts of cost allocation.

¹ 451 Research, Cost Management in the Cloud Age: Enterprise Readiness Threatens Innovation (New York: 451 Group, 2019), https://oreil.ly/71tao.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. Inform Phase: Where Are You Right Now?

Create new playlist

Sign In

Sign Up