Automatic DB backups from VPS to object storage
I run all my projects on VPSes which are hosted by DigitalOcean. The code is stored in version control, which I back up regularly, but the production databases are only stored on the respective VPS.
Losing a production database could be a death sentence for my projects, so I wanted to ensure that the data is safe even if data on a VPS is destroyed.
I already create daily snapshots of production databases locally on each VPS, and these snapshots are stored for 4-6 weeks. What I wanted to do is back these snapshots up to a different service provider that operates in a separate data center.
1. Install rclone
I chose rclone to perform backups. Rclone is a command-line program designed to manage files in cloud storage. It includes many features to ensure data integrity and has been widely used for a long time.
The first step is to install rclone on the VPS (see the rclone installation instructions):
curl https://rclone.org/install.sh | sudo bash
2. Select a cloud provider
Rclone can back up to many different providers. I chose Backblaze B2, as they have a good reputation and great pricing. Their B2 Cloud Storage product is free for the first 5GB. Since my databases are usually just a few megabytes, I am well within the free tier limit.
Many online sources recommend backing up to two different providers. However, I already have a copy of the databases both on the VPS and on a volume storage which is attached to the VPS. Additionally, I usually have a recent copy of the production database on my development machine. Thus, I feel like one additional cloud provider is sufficient for the data to be safe for all practical purposes.
I created an account on the Backblaze website, and used the browser-based dashboard to create an App Key for rclone to use.
3. Configure rclone to use the provider
The setup process was simple, just write the following and follow the prompts:
rclone config
I named the Backblaze B2 remote b2
.
4. Back up the files
Backblaze B2 (and most cloud providers) use buckets to store files. Buckets are basically root-level folders. You can create a bucket using the rclone tool:
rclone mkdir b2:postgres-backups
Now we can back up the folder from the VPS:
rclone sync [path to VPS folder] b2:postgres-backups
After running the command, you can open the Backblaze dashboard in your browser to verify that the files have been copied there.
I use the sync
command, which deletes files from Backblaze that have been deleted on the VPS. I don't want to store backups for more than 4-6 weeks at a time, which is what I store on the VPS. There are other commands available if your needs are different, they are listed in the rsync docs.
5. Schedule daily backups
Once you know that everything is working, add the command to crontab
on your VPS:
crontab -e
Then in the file that is opened, add the following line:
0 5 * * * /usr/bin/rclone sync [path to VPS folder] b2:postgres-backups
This script syncs the backup folder to the cloud every day at 5am.
Note that I provide the full path to the rclone executable, to prevent issues with the CLI tool not being in the shell path. You can see where the rclone executable is located by writing which rclone
in the VPS terminal.
Conclusion
I now have a robust backup strategy in place for all my production databases.
There are some things I can still do to increase the safety even further. Most notably, Backblaze offers a so-called Object Lock, which ensures that files uploaded to buckets remain immuatable for a certain period of time.
Additionally, Backblaze offers Lifecycle Settings for buckets, which can be used to delete files automatically after a certain period. If I would combine Object Locking with auto-deletion, I would be safe from ransomware attacks, as an intruder would not be able to delete any backups from the cloud storage even with unrestricted access to the VPS.
However, the basic implementation above makes me feel safe enough about the data that I will not modify it for now.