AWS EBS Cost Optimization: 10 Tips to Reduce Your AWS Bill

Optimising Cloud Cost is an endless fight that we have to constantly keep an eye on, and relentlessly find the best way to optimise the system for a long-term run without incurring unnecessary expenses

What I want to share today is related to Amazon Elastic Block Store (EBS), an integral part of the AWS EC2 service. When operating EC2 instances, sizing EBS volumes appropriately initially will help you not only achieve your goal of running your application smoothly without running out of storage resources, but also save you budget by avoiding overprovisioning massive EBS volumes that you don’t actually need.

Therefore, I’ll share 10 actionable tips that I usually apply in order to optimise EBS usage and cut down unnecessary AWS costs.

1. Allocate an appropriate EBS volume initially.

We should estimate the storage our application will need and allocate just enough space for the server at the start. Avoid provisioning a large volume upfront without knowing the actual requirements. Later, we will add a disk monitoring method to watch for it over time. Let’s say if we detect it reaches the 70% threshold in a particular timeframe, e.g. a week or so, and the disk usage isn’t showing a signal of slowing down. We totally increase the EBS volume additionally at this stage without losing money on them initially.

If we overprovision EBS disk space initially, it may become harder later to downsize the EBS rather than increasing the disk only when we need. Downsizing EBS volume involves many obstacles, such as data migration, a long period of system downtime, etc. And in some critical systems or services, it may not be easier to bring it back after we have completed the data migration.

2. Migrate all “gp2” volume to “gp3”

Simply migrating any current gp2 volume to gp3 can save you a significant amount of money every month. As it states in AWS docs, gp3 offers a 20% lower price compared to gp2 per GB. Another reason is that gp2 relies on a burst I/O credit system (see more here) that provides unpredictable performance while gp3 delivers a consistent baseline of IOPS performance and throughput at 3,000 IOPS and 125 MiB/s respectively, and also allows us to increase IOPS/throughput independently for an additional cost. With gp3, it helps produce a predictable performance without losing performance as well as increasing cost.

3. Periodically check disk usage on EBS volumes from all EC2 instances

We might want to schedule a check every month to see if any EBS volumes are being used significantly less than what they should be or are left unattached. For example, someone creates a large TB volume for MySQL to test either recovery capability or any other experiments. Afterwards, they forget to delete the EBS volume after testing, and leave the EBS untouched for months, even for years. Another example is that your team performs an optimisation to an application (e.g. moving data from MySQL to another architecture), which significantly reduces EBS usage, which leads to your EC2 instance no longer needing such a large EBS volume anymore, and you can perform a downsizing of the EBS volume if needed. Periodically checking can avoid unnecessary cost so that we don’t be surprised by an issue that we would have controlled.

4. Choose appropriate general-purpose EBS volume types for your applications

In general, we mostly go with gp3 if we don’t have any specific requirements for the EBS that our application is supposed to run on. However, sometimes we might have applications that need either an intensive workload and a sustained IOPS performance OR the cheapest volume types for storing data that is infrequently accessed, such as big data, streaming log, etc. Therefore, we can select as below:

  • Throughput optimized HDD (st1) or Cold HDD (sc1): st1 aims for big data, low cost storage, ideal for e.g. FTP server, data warehouses, etc. While sc1 is the cheapest option, we might want to use this type for large and infrequently accessed data. A usage example is that we store backups and archives where we don’t bother about retrieval time. Bear in mind that bootable or transactional workload is not supported for both of these volumes.
  • Provisioned IOPS SSD (io1, io2 Block Express): these volumes aim for I/O-intensive workloads, such as large relational/NoSQL databases (Oracle, MySQL, PostgreSQL, MongoDB, etc), Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) systems, etc. We can totally use gp3 with additional IOPS if needed instead, but if we have more budget and are ready to pay a premium price to guarantee intensive workloads, these io1, io2 Block Express could be ideal options.

5. (Optional) Create and attach volumes only when needed by scripting

Apart from running a backup by AWS snapshot, there are some other situations where we might back up our data by our own tool. That’s when we need the EC2 instance to have sufficient disk space to store the created backup file/folder along with the original data. However, if the backup process takes only a few hours to finish, and we don’t run the backup daily, maybe run the backup every 2 or 3 days instead, and we don’t intend to keep the backup file on the EBS volume. Therefore, having a large EBS volume always attached to the EC2 instance may be a waste of money while it’s not really being used.

My real case

With that being said above, I have an EC2 instance (us-east-1 N.Virginia) running MySQL, with two EBS gp3 volumes attached to the instance. The first volume is 2TB, containing original MySQL data. The second volume is 1TB, and I just use this 2nd volume for storing the backup file when I run the MySQL-backup script. Once the backup script finishes, it will upload the backup file to another AWS region and delete the current backup file. I also have an EBS snapshot script running daily for this MySQL server, so I don’t need to run the backup script daily, just run it every 3 days. Furthermore, I not only have 1 MySQL server with 2TB in size but also another 4 MySQL servers. As I mentioned, each of these databases will need a 2nd EBS volume of 1TB just to store the backup file. Keeping these 2nd volumes monthly period has brought the cost of (1024GB * $0.08 per GB/month) * 5 volumes to $409.6/per month.

I’ve adjusted the backup script so that when it’s triggered by a cron task every 3 days, the process is to just create the 2nd EBS volume, attach it to the instance, perform the MySQL backup, upload to another region, unattach the volume from the instance, and delete it afterwards. In this way, I cut down the cost to approximately $136/per month (maybe lower). Because the time I need the 2nd EBS volume is around 10 days, not 30 days.

Hopefully, this case gives you more ideas for optimising EBS volumes.

6. (Optional) Optimise the application I/O to reduce IOPS needed.

At an advanced level, sometimes, our application uses more IOPS than what the application should. For example, I have a Grafana server running on an gp3 EBS volume, and Grafana uses Collectd and Graphite engine to collect metrics from the entire system. Initially, for an unknown reason, it was causing tension on the volume over a long period of time, which exceeded the default 3000 IOPS, and went up to over 6000 IOPS. To cope with this issue, I had to increase IOPS temporarily.

Later, I figured out that the Collectd service had “rrd” plugin enabled, a plugin that collects performance metrics and stores them in RRD (Round-Robin Database) files. I wouldn’t need this plugin, so just disabling it has brought the IOPS down to approximately 2500 IOPS, a significant drop, but still a bit high compared to the baseline of 3000 IOPS. At the same time, I have found a new Graphite engine version developed in Golang, rather than the original Python, which is claimed to be significantly faster. You can check go-carbon for more information. After researching, I upgraded to the Golang-based Graphite engine, which turned out to bring the IOPS down to just about 1500 IOPS. This improvement not only reduces the cost for IOPS but also enhances Grafana performance a lot.

From this experience, I realise the solution sometimes lies at the application level instead of focusing on the underlying hardware.

7. Clean up old and unnecessary EBS snapshots.

There’s a scenario where you have a lot of EBS snapshots, some of which are too old and are not suitable for recovery purposes anymore. There is no reason to keep them there while we still have to pay a lot of money for them.

That’s where we apply Amazon Data Lifecycle Manager with a pre-defined lifetime to delete old and unnecessary EBS snapshots. For example, we create creation-and-retention policies that target volumes by tags (e.g., backup-policy:keep-7-daily-recent-backups)

8. Implement object-targeted retention policies

If you have a cron task that creates EBS snapshots daily for critical volumes, such as MySQL, RabbitMQ, Mail server, etc. Depending on your snapshot types, we generally need to implement a policy to, e.g, keep daily snapshots for one week, weekly for one month, and monthly for one year, and delete snapshots that are no longer needed at the interval of creation of each new snapshot.

In some more specific cases, we might not want to apply the same retention policy for all snapshots. For instance, we might want to shorten the retention policy for snapshots of some EBS volumes while keeping more snapshots of other EBS volumes. In my case, I have EBS snapshots of two different MySQL volumes. I name them mysql-product volume and mysql-customer for easy understanding. I want to have a longer retention for snapshots of mysql-product volume as it’s more important, and shorter retention for snapshots of mysql-customer volume as it’s less important. I can create a script to delete EBS snapshots associated with both of these volumes with the policy below:

  • Snapshots of mysql-product volume: keep the most recent 3 daily backups, 4 recent weekly backups, and 8 recent monthly backups.
  • Snapshots of mysql-customer volume: keep the most recent 2 daily backups, 2 recent weekly backups, and 4 recent monthly backups.

I used to create a script in Golang to delete old EBS snapshots of a supplied volume-id with the number of snapshots we want to keep. Please read A CLI tool to delete a bulk AWS snapshots and keep the expected snapshots based on snapshot age blog to know more if you’re interested.

The reason for using a custom script instead of Amazon Data Lifecycle Manager is that it allows us to tailor it more based on our requirements, which we can’t achieve just by using Amazon Data Lifecycle Manager.

9. Strategically choose cheaper storage tiers

Amazon EBS Snapshots offer two types of tiers (pricing example for N.Virginia region, you might need to check your corresponding region):

  • Standard: $0.05/GB-month – and free for restoring the snapshot
  • Archive: $0.0125/GB-month for storing the snapshot and $0.03 per GB of data retrieved for restoring the snapshot. You can read here for more understanding of how it calculates cost.

It depends on your backup strategy. You can choose to archive all created snapshots to the Archive Tier if you think you rarely need to access them. Or doing a mix like I did, for example, I implemented the method 7 above, that’s when I have the most 3 recent daily backups, 4 recent weekly backups, and 8 recent monthly backups. For the recent daily/weekly snapshots, it’s likely that when I need to recover the data, I might need quick access to these recent snapshots, so I will keep them at the Standard tier. However, for the old monthly snapshots, let’s say older than 65 days, we move them to the Archive Tier as we rarely need to access those. And if you don’t really access them, the cost for storing them is just $0.0125/GB-month.

10. Delete snapshots associated with the deregistered AMIs

As you might know, when we deregistered AMIs, the snapshots associated with these AMIs were not automatically deleted at the same time. We have to search for these unused snapshots and clean them up. Therefore, periodically checking this matter, as we did with unattached volumes, also helps us avoid hidden cost that we don’t usually notice.

Conclusion

Here are 10 tips that I usually perform when optimising EBS volumes. The easiest tip we can start with is just to migrate gp2 to gp3, you will see an obvious change in your bill then ^_^. I believe AWS cost optimisation is an ongoing challenge, and we can optimise from various aspects. With these 10 tips, we may build up a strong habit which adds more value to the place you work. I hope you can find out more ways, and if you have any more ideas, please feel free to let me know in the comments below. Happy reading!


Discover more from Turn DevOps Easier

Subscribe to get the latest posts sent to your email.

By Binh

Leave a Reply

Your email address will not be published. Required fields are marked *

Content on this page