How to Overcome Rsync Issues When Migrating Large Amount of Data Between EBS volumes

If you operate a system within AWS, Amazon Elastic Block Store (EBS) volumes are an integral part. Data migration between EBS volumes is a task that we may have to do at some points. In this blog, I want to share a situation we encountered when migrating TBs of MySQL data from one EBS volume to another. This kind of data migration can be for many reasons, such as data backup, data recovery, or simply changing volumes for cost optimisation, etc. In my case, that is to achieve the cost optimisation goal when the current database was allocated a huge volume size initially, and as time goes by, it doesn’t need that much size anymore because we have reduced a lot of data stored in this database, and it would be good to downsize the EBS volume to an appropriate size.

NOTED: if you want to know more about other strategies in optimising AWS cost, particularly in EBS aspect, I’ve recently written AWS EBS Cost Optimization: 10 Tips to Reduce Your AWS Bill, hope it help you in some cases.

Situation

When storing 1.2TB MySQL data on an EBS volume (gp3, 3TB size initially), I called it Volume 1. The goal is to migrate data from Volume 1 to another new attached EBS, called Volume 2 (gp3, 1.4TB size). Both volumes are using a baseline IOPS of 3000 IOPS and a baseline throughput of 125 MiB/s. There are multiple ways we can use to copy data over. On Linux, we can use commands such as cp, mv, rsync, etc. In my situation, I’m using rsync for efficient and robust data copying. To copy this large amount of data, I tend to run the rsync command in the background so that I don’t have to keep the SSH session open while waiting for the copying to finish. The command looks like:

rsync -avzh /mnt/mysql/ /mnt/mysql-new/ &

While the command runs OK with a small amount of data, I don’t know how many amount of data it works fine with, but as I tested, below 50GB is good for this kind of running. However, if I migrate TBs of data, after copying over 50GB of data, it always fails with an error that looks similar to the one below:

rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(644) [sender=3.1.3]

Even if wrapping the command into a bash script and executing it by running ./run-rsync.sh & OR combining with nohup to avoid signals, for example:

nohup rsync -avzh /mnt/mysql/ /mnt/mysql-new/ &

These methods don’t solve the issue at all. I also tried with cp or mv commands, but they encounter the same issues. I realise that when I keep the SSH connection to the EC2 instance open, where I run:

rsync -avzh /mnt/mysql/ /mnt/mysql-new/

It does work! But that also means I won’t be able to close the SSH session until the rsync process finishes, which may take a day or longer to complete. With this method, if I suddenly lose the internet, the SSH session will be terminated, which kills the rsync process as well. That’s when I have to start the rsync process again, and it seems to be annoying.

It’s hard to identify what has caused this, and I can think of a few potential causes:

  • System Resource Limits: that’s when we have the operating system’s Out-Of-Memory (OOM) situation caused by running out of memory or swap, which kills rsync processes as well if it occurs. My EC2 doesn’t fall into this case.
  • EBS Volume Type Limits: There might be cases where rsync has reached the baseline threshold of either IOPS or throughput. You may want to see here for more information. I did try with an EBS of 10,000 IOPS and a throughput of 300 MiB/s to test, but I still encountered the error.
  • EC2 Instance Limits: The EC2 instance type also imposes limits on the aggregate bandwidth and IOPS it can drive to EBS volumes. This means we likely have a bottleneck even if I provision an EBS with higher performance. I did adjust rsync with the --bwlimit option to limit the transfer rate, but it didn’t help.
  • Many other reasons can lead to this issue, and it may depend on where you host your storage.

NOTED: You might not have the same issue if you have different types of EBSs or a different amount of data to migrate.

Solution

As I mentioned, if I keep opening the SSH session where I run the rsync command, it works. To be able to run rsync in the background without encountering the error (code 20)..., I found that we can do this by following:

SSH to the EC2 instance where you want to run rsync, and open a separate screen for rsync by running:

screen -S rsync

Execute rsync as usual:

rsync -avzh /mnt/mysql/ /mnt/mysql-new/

Detach the rsync screen, press:

Ctrl+A+d

It brings us back to the original screen where we can terminate the SSH session without interrupting the running rsync process.

This helps me run rsync successfully in the background. Even if you’re not using AWS and you encounter this issue, you may want to try this solution. Hope it helps you somewhat. Thanks for reading.


Discover more from Turn DevOps Easier

Subscribe to get the latest posts sent to your email.

By Binh

Leave a Reply

Your email address will not be published. Required fields are marked *

Content on this page