A quick guide to backups using tar


Though tar is widely used for archiving it is rarely used for daily backups because it has no incremental capability - or at least, most people don't think it has. In fact, the GNU version of tar has a perfectly good mechanism for creating and restoring incremental archives; it just isn't very well documented on the man page and you have to hunt around to find a proper description.

It works by storing additional metadata in a separate file called a snapshot file. Let me illustrate using a miniature example - let's suppose I start on 'day 1' with a directory called mypics that contains three files:

$ ls ~/mypics
caption.jpg  storm1.jpg  sunset1.jpg

If I create a tar archive of this I'll get all three files in it. It is effectively our 'level 0' backup:

$ cd ~/mypics
$ tar cvf /backups/mypics.0.tar -g mypics.snar .

The first argument to tar (cvf) is actually a set of three options. c means create the archive, v means verbose (that is, list the names of the files as they are written to the archive) and f means 'the next argument is the name of the file to write the archive to'. I am assuming here that /backups is a mount point for a filesystem from an NFS server, or perhaps for an external disk drive.

The -g flag is the interesting one. It tells tar to keep a record of what has been archived (and when) in the snapshot file mypics.snar. Finally, the insignificant-looking '.' at the end of the command is the name of the directory I want to archive; in this case, the current directory.

By day 2 I've added a new file to my directory called baby.jpg. I create another archive. It contains only the new file and is our 'level 1':

$ tar cvf /backups/mypics.1.tar -g mypics.snar .

I can continue on day 3, creating a level 2 backup like this:

$ tar cvf /backups/mypics.2.tar -g mypics.snar .

Please be clear that the digits I've put in the output filenames are only for my benefit and in no way control what level my archive will be. That's all handled by the snapshot file mypics.snar. As long as I keep updating the same snapshot, each archive will be incremental to the previous one.

OK, now let's assume that for some reason we lost the entire content of the mypics directory and need to restore from the backup. I would need to restore each of the levels in order:

$ tar xvf /backups/mypics.0.tar -g /dev/null
$ tar xvf /backups/mypics.1.tar -g /dev/null
$ tar xvf /backups/mypics.2.tar -g /dev/null

Even when restoring, you still need the -g flag to get the incremental behaviour, but in this case it does not actually need the snapshot file. It is conventional to give /dev/null as a placeholder argument here, but anything will do. When extracting from the incremental backup, tar attempts to restore the exact state the filesystem had when the archive was created. In particular, it will delete those files in the filesystem that did not exist in their directories when the archive was created.

The above scheme creates a new level of backup each day. An alternative scheme might be to do a level 0 archive to begin with, then just a level 1 on each following day. Of course, the level 1's will gradually get larger, but this scheme makes it a little easier to restore from the archive as you only need to keep the level 0 and the most recent level 1. This requires some manual management of the snapshot file.

In particular you would need to create a working copy of it to use for the level 1 backup on day 2, and on day three you'd again make a working copy of the original snapshot file to make your next level 1. On day 2 you'd do something like:

$ cp mypics.snar mypics.snar-2
$ tar cvf /backups/mypics.day2.1.tar -g mypics.snar-2 .

and on day 3 you'd do it again:

$ cp mypics.snar mypics.snar-3
$ tar cvf /backups/mypics.day3.1.tar -g mypics.snar-3 .

Six obvious things about backups

  1. The most important thing about backups is not that you choose the latest, fastest, super-compresso technology, but that you actually make sure you do them, in some reasonable way, on a consistent, regular basis. Doing backups is a bit like paying insurance premiums - you kinda hope that you're never going to need to make a claim, and the temptation is not to do them at all.
  2. Making backups of a filesystem on to the same hard drive that the file system is on is a bit like asking Carla Sarkozy for a date - ie a complete waste of time. Don't do it this way.
  3. If you backup on to another machine on your network, keep in mind that if your machine gets hacked, the backup server might too. (There is nothing more reassuring than a ten-foot physical gap between your local network and an external USB drive sitting on a shelf.)
  4. If you backup on to removable media, label them!
  5. Consider storing external backup media (such as CDs or hard drives) off-premises. I find my next-door neighbours quite co-operative in this. Of course, you are giving them access to all your private data, so you need to trust them (or assume they won't figure out how to access it).
  6. Whatever backup method you use, make sure you can actually restore files. Do a 'fire drill' - pretend you've lost some files, then go through the process of recovering them.

You should follow us on Identi.ca or Twitter

Your comments

hmmmmm not bad

thanx for the post though i see why rsync is the default and preferred way of doing backps

star may prove more useful

I've taken to using star instead of tar as it allows you to backup individual filesystems without crossing filesystem boundaries, so you can have separate backup files for /, /home etc. This makes things a little easier when restoring.

Imagine you have a single tar file which contains a backup of the entire system and then wish to recover the file /home/mydir/myfile - you'd issue the command tar xvf /path/to/backup.tar ./home/mydir/myfile and tar would then have to work its way through the backup file until it gets to /home/mydir/myfile which, depending on how big the tar file is, can take a while. If star were used, you could do something like this:

cd /home
star -xv file=/path/to/home.star mydir/myfile

In addition, star is reputedly faster and will cater for ACLs on files.

Yes, rsync is useful, but there's always the problem that if, say you make a mistake when editing a file, or otherwise trash it, that file could get rsync'd to your backup system/drive before you get the chance to grab the backup copy! :)

Also for Selinux attributes

star will save the extended selinux attribs [ exustar ] but tar will not. Rsync will not, either.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Username:   Password: