Using rsync and Time Machine for web site backups

Jeff Atwood has declared December 14th, 2009 as International Backup Awareness Day, and with good reason, as his blog, Coding Horror, experienced catastrophic data loss.  Long story short, a disk drive failure on a external hosting server resulted in complete data loss and a recent full backup was not available.  A lengthy, manual restore procedure is under way. Lesson learned: Jeff suggests maintaining your own backups of any content hosted on someone else’s server.  I believe this is an excellent idea and here I present how I backup this web site along with several others and the contents of my home computer.

Why do your own backups?

First, we should establish that performing your own backups is desirable and necessary.   Most 3rd party hosting services provide backups.  However, with your own backup scheme there are several advantages:

  • You control the retention policy – You can decide how many versions to keep and for how long
  • Fast access to backups – You can control how accessible your backed up data is and there should be no lengthy retrievals over slow network connections
  • Review for unexpected file changes – While you are backing up, why not look for unexpectedly changed files, which can be an indication of a web site hack.
  • Trust no-one – be responsible for the data you own, whether it was created by you or by others, you have a responsibility to ensure it is protected and safe at all times.

You need 3 backups, not one

Convenience – You need a drive attached to the computer at all times such that backups can be automated and convenient.  Any inconvenience or barrier to performing the backup, such as having to plug in a drive, etc. will make it less likely you have a recent copy of the data.

Theft – If someone breaks in and steals your home computer, they will steal any and all attached backup drives.  You need a copy not attached to the computer, preferably in a fireproof safe

Fire – If your house burns down, it is likely any hard drive on site will be destroyed.  You need an off-site copy.

Lastly, RAID (mirrored) is not a backup – it only protects you from simple hard drive mechanical failures.  It will not protect you from application level data corruption,  accidental updates or deletions, physical theft, fire, malicious users, disgruntled employees or associates, etc.  Therefore we will not discuss it further.

Cloud Backup vs Local Hard Drives

A simple way to achieve much of the above would be to use a cloud based online backup service.  I cannot recommend this approach simply because I have no practical experience with it.  It sounds good in theory, but here is what I don’t like about cloud: 1) I don’t like regularly recurring expenses.  I’d rather drop $100 on an external USB hard drive here and there and 2) performance, particularly for data retrieval can be a problem with the various online services, and 3) depending on what kind of data you are working with (multi gigabyte virtual machine images anyone?) upload speeds may be prohibitively slow, if for no other reason than the typical asymmetric upload/download speeds provided by most ISPs (DSL or cable).  Lastly, depending on how paranoid you are , data privacy may be an issue for you.

Basic process

I use 1and1.com’s Linux based web hosting and I run several WordPress sites (including this one) and a VBulletin forum (http://speedbagforum.com).  Both the WordPress and VBulletin web sites utilize Apache, MySQL, and PHP.  The hosting company provides SSH access, which grants me a high level of control and flexibility and enabled me to implement a very automated (but not completely) backup solution for all the aforementioned web sites.

To backup the web sites, including the PHP/HTML files, MySQL database, and all uploads/attachments, I simply run a BASH script on my Mac.   This script starts an SSH session (passwordless for convenience – I want to avoid any barriers to performing the backups – instead of passwords I am using private/public key authentication), remotely runs another BASH script on the linux host using MySQLDump to backup the databases.  The script then invokes rsync to copy the files to my Mac.  It then invokes another remote BASH command to remove the SQL backup files.  Since I use Time Machine on my Mac, it will pick up the newly copied local files off to an external USB drive, which I swap monthly , rotating through several drives I own.  One drive goes into the fireproof safe in the house, the other goes to a relative’s house (also kept in a fireproof safe).

The main BASH script could be automatically run via CRON.  However, I prefer to run it manually.  Firstly, this allows me to check the web sites and make sure no major corruption (hacked, etc – no sense backing up a corrupted web site!) has occurred and secondly this gives me a chance to review the changed files (displayed by rsync) and I review for any changed files out of the ordinary (a common virus writer trick is to spread malware via other’s websites).  You should always investigate any unusual or unexpected file changes.

Details

The bash script I run under Mac OS X is quite simple:

#!/bin/bash
echo "Backup of databases on remote server..."
ssh (user)@(domain.com) ./databasebackup

echo "rsync all web files, including sql backups..."
rsync -avz --delete (user)@(domain.com):~ /Users/tplatt/ForumDownload

echo "Remove sql backups on remote server..."
ssh (user)@(domain.com) 'rm -v *.sql'

echo "Complete."

(user)@(domain.com) should be changed to your linux login for your remote host.  If you have not created key authentication, you will be prompted for passwords when logging in.

First, the databasebackup bash script is executed on the remote host.  This script uses the mysqldump command to dump all the database data into a file.  Note that the file is uniquely named such that it contains the current date and time, which ensures we can retain multiple copies of the database file.

#!/bin/bash
now=`date +%Y%m%d-%s`
echo "Date is $now"
#
filename="dbbackup-spf-$now.sql"
echo "Dumping $filename..."
mysqldump -h(server name) -u(database user) -p(password) (database name) > $filename
tail -1 $filename

The actual script that is run repeats the mysqldump command steps for all databases.  (server name) should be the MySQL server fully qualified name.  (database user) is the MySQL login and (password) is the corresponding password.  Lastly, (database name) is the name of the database.  The end result of this script is that there will be one or more .sql files in the home directory on the remote host.  The tail command outputs the last line of the backup file, which allows me to visually confirm successful backup, if the line doesn’t indicated “Dump completed”, I know the database backup failed.  A partial database backup is pretty much worthless.

-- Dump completed on 2010-01-27 13:09:26

Once the ./databasebackup scripts has completed, an rsync command is invoked.  This command copies down all files in the user home directory into a local directory (on my iMac).  The –delete option ensures any deleted files are removed from the local directory.   So, that being the case, how can I use this to retrieve an accidentally or maliciously deleted file?   The final part of the equation is Time Machine running on the local Mac.  Hourly the local directory is backed up to an external drive, where deleted files are retained, at least until space is needed.  I can typically fit 2 months worth of web site backups on the external drive – your results may vary according the size of the drive, size of the backups, and any other data Time Machine is backing up for you.   What if a file is deleted and I don’t notice for 3 months?  I go to the drive stored in the safe, which will have older backups, or the drive offsite.  When the rsync command executes it displays a list of the files being transferred (the -v verbose option), which gives me a quick glance at any changed files.

In summary, ssh and rsync enable powerful and easy backups for your hosted web sites.  Combine those with Time Machine for Mac OS X and you have a powerful, simple backup solution that is quite comprehensive.

About Timothy Platt

I'm an all around computer junkie, interested in many aspects of programming, operating systems, and enterprise IT technologies. I love Mac OS X, Linux, and Windows and I'm not particularly militant about any one over the other.
This entry was posted in Mac OS X. Bookmark the permalink.

Comments are closed.