An engineers guide to FinOps

Our very own FinOps expert, Tom Cross, gave a great presentation at the first Amach Meet Up in Dublin. The following is the text file from the video, for those who prefer the long read, including the Q&A from the end of the session!

Our very own FinOps expert, Tom Cross, gave a great presentation at the first Amach Meet Up in Dublin.

The following is the text file from the video, for those who prefer the long read, including the Q&A from the end of the session!

Thanks for coming everyone.

Today I wanted to give you an introduction to cloud finops from an engineering perspective and why it’s important for engineers to get involved. A quick intro about myself first: I’m Tom Cross, as you can see on the slide, Technical Manager at AAC, currently working with Aer Lingus as a Cloud Infrastructure Technical Lead and a member of the Technical Advisory Council for the FinOps Foundation. Before this, I worked in a cloud native managed service provider doing cost optimization and Cloud resale for their customers and before that I was a software engineer. That’s where I really developed my passion for optimizing things and making things work better than they used to and that’s what got me into finops.

So, let’s jump in. Before we talk about what finops is, we need to understand a bit about how the advent of cloud made finops necessary. Once upon a time, owning and managing your own hardware was the norm. The barrier to entry was high and so was the cost of making mistakes, if you bought the wrong stuff. This led to hardware being massively over

Provisioned but it didn’t really matter. Once it was paid for, nobody worried about the cost; it was all there, running, and for the most part, costs were stable and predictable. Finance, the gatekeepers to the kingdom, were living the dream. Then comes the cloud, changing everything. Suddenly, anyone with a credit card can get involved, with a much lower barrier to entry, cheap experimentation, and low risk. The rise of the DevOps movement, as Callum referred to earlier, brought new processes and further increased the pace of innovation. However, while cloud and DevOps made it easier for companies to provision and manage their infrastructure, they also created new challenges. With the ability to spin up new infrastructure on demand, it becomes increasingly difficult to keep track of costs. Many companies even today find themselves with massive cloud bills from their providers and no real idea where those costs are coming from. The elasticity of cloud, combined with complicated billing models, makes it difficult to predict costs.

Predicting costs is difficult and stressful for finance teams because their expertise is in forecasting, but old approaches don’t work in the cloud, leading to chaos. They need a new model. In some companies today, even buying a $10 USB stick requires forms and approvals, yet a cloud engineer can provision thousands of dollars of infrastructure without any approval with just a click. This challenge led to the rise of FinOps. Billing in the cloud is complex, and most people are too far removed from it in their day-to-day work, leading to problems. Understanding cloud billing, even at a high level, gives an edge.

Let’s consider a simplified example. The cost for any service is the rate multiplied by usage in any given time period. You might be billed by the hour or by the gigabyte, but the calculation is generally the same. If you’re trying to maximize cost efficiency, you’ll need to consider these factors.

Efficiency can be optimized both in rate and usage, but the levers available are different. On the rate side, it’s purely financial with no changes to your infrastructure. To optimize effectively, you need to understand various options like committed use discounts and enterprise pricing. However, you can’t make these commitments unless you understand future usage, like whether an application will run for another two years. Engineers who built the systems are crucial in this aspect.

On the usage side, it’s entirely in the engineers’ hands. Every engineer has the power to influence this. Every commit, every deployment is a financial commitment on behalf of your company. This is the most important part of the equation, especially from a sustainability or waste perspective, because if you don’t

Need it, then 100% of that cost is going to be wasted. That puts engineers in a real position of power and, as we all know, with great power comes great responsibility. The only way this is going to work is with collaboration. Your FinOps or finance team can’t do things in isolation; they need to understand what the future is going to look like. Engineers, it can be challenging to optimize usage if you don’t know where the biggest spend is or where the optimization opportunities are. A FinOps team can give you the data you need to be laser-focused on the most impactful changes you can make as an engineer.

Complexity lies in the scale of the billing data. You’re consuming tens or hundreds of services across multiple accounts and cloud providers, and all these services are billed at increasingly granular levels, sometimes down to the millisecond. All this infrastructure is scaling, changing, and expanding all the time. All of this leads to

To absolutely massive volumes of data. It’s not uncommon to get millions or billions of lines of data in your billing files and you can’t put it in Excel, it’s just too big. You need new tools and processes to help get the information to the people who need it as quickly as possible.

This is a bit of an intermission with a quiz bit of interactivity for fun. I’m going to ask a bunch of questions and see if anyone knows some stuff. So, how much do people think was spent on public cloud in 2022 globally in dollars? Would anyone like to guess? 100 billion, 1 billion dollars, that’s a big number. Anyone want to go higher? 200, 200 billion. It’s actually $490 billion, which is a pretty big number, right? How much do people think the forecast for 2023 is? Any guesses? What do we think? 525 billion, that’s a very precise guess. It’s actually a little shy though, it’s 600 billion. 600 billion is bigger than the GDP of Ireland and I was doing some Googling earlier and Google tells me there’s 1.7 million houses in Ireland if

You assume that every house in Ireland was the same price as the average house in Dublin, you could buy every house in Ireland and still have plenty of change to live very well with 600 billion. So, that’s like a serious chunk of change.

So, next question, what percentage of that spend do you think is waste, like unutilized resources or over-provisioned resources? How much do you think? 15, 50, 50 is high, wow, Jesus, we’re in trouble. It’s 32%, almost a third, but that’s still pretty ridiculous, like it’s disgustingly high in my view.

And the last question, what do we think the forecast public cloud spend is for 2025? Adrian, a trillion, a trillion, it’s not even close mate, not even close. It’s 1.9 trillion. So if that trend of waste continues into 2025, there’ll be more wasted in 2025 than there was spent in 2023. So even if you don’t care about the money at all, think of the environmental cost of that. The cloud has already got a bigger carbon footprint than the airline industry, which is pretty crazy, right? And I personally feel that we have an obligation to use only what we need. One of the great benefits about the cost optimization side of FinOps is that it Aligns really well with digital sustainability, and it seems that cost is actually a pretty good proxy for carbon. So, by helping to optimize your cloud costs, you’re doing your bit for the environment and helping to reduce your carbon footprint. So, it’s a bit of a win-win.

Okay, so what exactly do we mean when we talk about FinOps? What do we mean when we say FinOps? The word itself is a portmanteau of ‘finance’ and ‘DevOps’. And just like DevOps, as Callum was explaining earlier, it’s rooted in collaboration, automation, and continuous improvement. People often equate FinOps purely with cost optimization, and there’s certainly an element of that, but it goes much deeper.

So, let’s look at the definition according to the FinOps Foundation. FinOps is an evolving cloud financial management discipline and cultural practice that enables organizations to get maximum business value by helping engineering, finance, and business teams to collaborate on data-driven spending decisions. You can see I’ve added a couple of highlights in there, just for things that I find interesting. And the first one is ‘cultural practice’, because this is really about the changing the culture of

Businesses are increasing collaboration and giving teams a better way to manage their Cloud spend, where everyone takes ownership of their own usage. The second highlight, maximum business value, is about ensuring teams have the tools necessary to maximize the efficiency of software delivery. Cost is just another metric to measure efficiency, and the more efficient we are, the greater the return on our investment. The last highlight, data-driven, is about ensuring teams have the data they need, making better decisions, and understanding the trade-offs when building solutions in the cloud. If you can get granular to things like the cost per transaction or the cost per API call of your application, you can start to see how the changes you’re making are going to affect those metrics and make objective decisions about whether a new feature or an optimization is worth the effort.

We’ve talked a bit about what we mean when we say FinOps and I wanted

to give you a really quick look at the finops framework, I’ve stolen this slide from the finops foundation who invented the framework and whilst I don’t have time to go into it all in great detail, I just wanted to give you a bit of an overview and show you the core principles that drive finops. There’s the personas that finops engage with as stakeholders, the phases, and then down the bottom there’s the domains and capabilities of activity that we perform as a finops practice. I just want to drill in on one bit of this, which is the phases of finops. This is very similar but a lot smaller than the devops life cycle. In a way, it’s modeled as a continuous feedback loop where you incrementally improve your processes every time you go through the cycle. So we start on the blue section there in form, and this is about creating visibility, providing everyone in the business with the data they need to be able to make better decisions. The second phase there is optimize. This is where we identify opportunities to optimize based on the insights from the first phase. This is basically where we build the plan. This is where we decide what we’re going to do and then the third phase operate. This is where we execute on the plan from the optimize phase and then we reflect on our changes before we go through the process again if we want to make any improvements. The important thing with this cycle is not to fixate on how long it takes to go through or when one phase starts and another one ends but to start small, get started, iterate, and start to develop that muscle memory and build trust within the teams that you’re working with.

Okay, so what can Engineers do to support a finops initiative or to support cost management? First thing, tagging. This is probably one of the least interesting but most important things that you can do as an engineer because it underpins so many of the other things, and you can’t manage what you can’t measure. If you can’t attribute costs to your applications, then it becomes almost impossible to manage your cost. In my view, this only gets harder as you scale so the sooner you start the better. If you don’t already have a tagging policy, get on it because sit down with your colleagues and agree one because it’s essential, it’s table stakes. I think that’s what they say. So, a good starting point is being able to attribute costs by whatever business unit or organizational unit your company uses so that you can show at a high level how cost is distributed across the organization. Then you’ve got the workload level tags so you can divide costs by different applications and things like that and then lastly the environment so you can segregate your production and your non-production costs. And that’s a good starting point for most people.

Next, resource scheduling. Automated resource scheduling is the cloud equivalent of turning off the lights when you leave the room. It’s really easy to do and switching off the development environment out of hours can save you around 75% of your run cost so it’s definitely worth investing the time in.

Next up, budget alarms. Again, very simple to implement. All the different cloud providers have this capability out of the box pretty much and nobody likes to be told at the end of the month that they’ve exceeded their budget. So, have a look at historical trends for your applications, set up an alarm, and the cloud provider will send you a little alert when you’re going to exceed it. It’s a lifesaver, especially in really dynamic environments that are using lots of autoscaling or serverless functions.

Next up, we have right sizing. Optimizing for performance efficiency can deliver some really hefty savings and it’s simple to do. The last one I’ve got on here is just basic housekeeping. Just tidy up after yourselves, delete resources you aren’t using. Focus on efficiency and reducing waste and spend the company’s money as if it was your own.

And I know what you’re thinking. You’re thinking, “Tom, my team’s already busy, we’ve got sprints full of work and now you’re telling me I have to worry about costs and stuff as well.” And well, yes, I am. As we talked about already, cost is just another efficiency metric. So put it in the tool bag and get on with it.

I think the engineers are the only people who understand what it costs or what’s needed to run an application so they’re in a With the adoption of cloud, engineers hold all the power, which is where the ownership for the spend needs to sit. You build it, you run it, you own it.

I’ve mentioned the FinOps Foundation a couple of times but haven’t really told you about it. The FinOps Foundation was started by J.R. Stormont in February 2019, who’s the co-founder of Cloudability.

In 2020, they joined the Linux Foundation. They have about 10,000 members globally, continuing to grow, and their mission is advancing the people who manage the value of cloud wherever they are. They pursue this through their core objectives, which are creating connections, inspiring growth, and empowering best practices. If you’re interested in finding out anything about FinOps, I encourage you to check out their website. It’s got a ton of useful material, a friendly and active Slack group, the framework we talked about earlier, and lots of other interesting stuff. They’ve got working groups around cost management for containers, which is a challenging topic if you want to get into, and they also

Host meetups, and they even have a Dublin chapter, so it’s a good time to get involved in that. To round things off, I’m going to finish with my favorite FinOps quote, which is, “FinOps isn’t about saving money; FinOps is about making money.” For me, hearing this was a bit of a lightbulb moment because up to that point, I’d been looking at FinOps purely through the lens of cost optimization.

As I’ve learned since then, and hopefully, I’ve shown a bit for you this evening, it’s about much more than that. It’s about empowering engineers to make better decisions that drive business value, creating a culture of accountability and ownership across teams where everyone understands the impact of their work.

Remember, the best way to save money on cloud is not to spend it in the first place. Be ruthless about hunting down and eliminating waste. In this way, I think that FinOps is the best hope for starting to address some of the massive waste that our industry is responsible for and being a little bit more sustainable.

That’s me. So, I’ve got a couple of calls to action. If any of this has been interesting to you, go and educate

yourself, find out a bit more about FinOps, join the community, and then be curious. Ask all the questions, speak to the people in your own companies, find out how they’re managing their cloud spend, if they are, and how much pain they’re in. Maybe you can help them. If anyone’s got any questions, fire away, or we can just get straight to the beer and pizza, which works for me too.

I suppose more an observation than a question, but you know, I came from an AWS perspective, and it all sounds great in terms of scaling to zero and stuff not being used, but it’s not necessarily priced appropriately.

They’ll seek a premium for serverless stuff because they need to make a return. But you’re only paying for it when it’s running, so you pay a small premium for the fact that it’s fully managed, but generally speaking, unless you’re using something that’s getting hammered all the time, like if you’ve got a serverless thing that’s running 24/7, you’re going to be paying through the nose for that. So like you’re using something like Fargate in AWS, that would be far better running on EC2 because it just doesn’t make sense to run something serverless if it’s running 24/7, unless you just don’t want the hassle of managing the infrastructure. But there’s a trade-off there. You can either pay someone to manage infrastructure for you or pay AWS to do it. It depends on how big your team is, but in terms of trying to get all these workloads working more efficiently, you could have more drive behind people changing their architectural practices.

Architectural changes are probably the most challenging part of FinOps because people can be quite wedded to the way they like to do something, and it can be more expensive from an engineering perspective to implement infrastructure or architectural changes than just right-sizing an instance or turning stuff off at the weekends.

It’s a bit more hardcore, so you really have to be able to understand the total cost of something, including the engineering effort and the onward costs of managing it, to really get into that. Which is a bit more challenging. Good question though, thank you.

ENC Cloud approach so because of the different prices of the cloud providers, I think generally speaking, the finops is pretty cloud agnostic. It gets more challenging when you’re running multicloud but in my view and I think in the view of quite a lot of people I speak in the finops community, trying to hop around between cloud providers chasing the best rates is a fool’s errand. You end up losing out on a lot of the functionality that’s baked into your cloud platform and you generalize something so that it’ll run in any cloud. You kind of lose it’s like a race to the bottom in terms of feature sets. You end up spending more effort in building a thing and trying to get it hopping around between the clouds than you would have saved if you just had it running in one place in the first place. But it’s an important factor to think of when you’re first going into the cloud if you don’t have a cloud footprint already like understanding what the different cloud providers can bring to the table and what the

Different pricing model is important. You said effort there Tom, do you think the effort to keep things optimized will always be greater than the waster? I don’t know, like I think if you’re tracking waste as a metric then as long as it’s prioritized alongside all the other things, especially if getting into kind of like the unit economic stuff that I was mentioning before, like looking at the cost per API call or the cost per action within your system, if that’s a key KPI for a team then it just becomes naturally prioritized as part of the work. If there’s a new feature being built and it’s going to double the cost per transaction in your system, then you can go back to the business and say this is going to have a real big cost impact, do we really want to do this or is there a different way we can build this. But it means that you’re making that decision at the beginning of the process rather than building the whole thing and going, “Oh we’re spending double what we were last month, what are we going to do about it?” It’s kind of starting with cost in mind, shift left, shift left, yeah, more shifting.

Anyone else? Grand, in that case, I think it’s beer o’clock. Thank you very much indeed for coming.

Speak with your tomorrow, today…

Get in touch

Share the Post:

Transforming the Skies: How AWS Generative AI is Powering the Future of Aviation

The aviation industry is no stranger to innovation. From optimized fuel efficiency to dynamic pricing engines, airlines have long relied on technology to improve operations and enhance the passenger experience. Today, a new chapter is unfolding with the rise of Generative AI (Gen AI), with Amazon Web Services (AWS) playing a central role in driving this transformation. In this blog, I explore how airlines can harness the power of Gen AI using AWS’s advanced tools and infrastructure to unlock greater efficiency, personalization, and innovation across their operations.

Modernizing Aviation Data Platforms: Deploying Databricks on AWS

In today’s aviation landscape, where real-time insights, predictive maintenance, and seamless passenger experiences are paramount, data has become the fuel of transformation. One of the most powerful engines driving this change is the Databricks Lakehouse Platform. When deployed on Amazon Web Services (AWS), it provides a scalable, secure, and high-performance environment tailored to the needs of airlines and airports.

An engineers guide to FinOps

Thanks for coming everyone.

Speak with your tomorrow, today…

Related Posts

Transforming the Skies: How AWS Generative AI is Powering the Future of Aviation

Modernizing Aviation Data Platforms: Deploying Databricks on AWS

Let's Start a Conversation