(April 2009)

In an adjoining article, I have described the techniques I use to perform a daily backup of an overseas Linux server - in a very efficient manner.

Windows machines are a different story, though.

An ideal Windows backup strategy must...

I searched for something like this and found many solutions that had some of the desired features... but none that had them all.

Challenges

Backup where?

Tapes and optical disks are not ideal backup storage - they would require manual fiddling each time (change tape, insert DVD/BlueRay disk, etc). We want the process to be completely automated, we want to set it up and forget about it. Hard drives (e.g. external USB drives) and network shares fit this profile. In fact, since we intend to use techniques that only store the modified/new data, we can start with enough storage for e.g. twice the amount of our windows drive: a 500GB external USB drive will be able to store anywhere from hundreds to thousands of daily snapshots of a 250GB Windows machine. Why? Because we rarely change more than 1GB of our data on a single day (depending on disk usage patterns of course - don't go searching for extreme scenarios). To see a more technical Linux-based example of the inner workings of what we'll do, read the explanation of using hard links and rsync from my Linux backup page - or just take my word for it :‑)

Instantly navigateable

Windows XP Professional are equipped with NTBackup. I am told that this works fine, and I don't doubt it. However, I want to be able to access my backed-up data directly, and not through yet another GUI. I want to be able to open last week's version of VeryImportantDocument.xls just by browsing with Explorer to that day's backup directory, and double-clicking on it. Rsync and filesystem hard-links provide the necessary functionality for this, so why would I use yet another application to "extract" my old version? Why do I need to decide about "full backups" and "incremental backups"? I want all my backed-up data accessible, all the time. And I can.

Your file is being used by another process... Sorry...

In sharp contrast to UNIX - where I have never seen any applications/filesystems enforcing draconian read/write access policies - there are a lot of files under Windows whose contents are simply not accessible:
(start a CMD prompt from an Administrative account)

C:\> cd c:\windows\system32\config
C:\WINDOWS\system32\config> copy SAM c:\
The process cannot access the file because 
it is being used by another process.
        0 file(s) copied.
The filesystem doesn't let us - these files were opened with exclusive access modes. The developers who built the relevant applications knew that these files adhere to binary formats (i.e. registry hive, SQL Server files, etc), and since there is no guarantee that these files are in a consistent state, they don't want us to read them. What we would read would be useless anyway... yet another reason why UNIX, with its ASCII-based configuration files under /etc is much better than the registry - and in the same vein, Thunderbird, with its plain ASCII-based mbox format is much better than the cryptic Outlook PSTs. I digress... (it's tough not to, when you see this kind of things).

So how do we back these files up? There are very important files included in the "forbidden fruit" category... the registry hive, the SQL Server files, Outlook's PSTs - i.e. your mail! (unless you are wise enough to use Thunderbird, which has no such issues) - etc. Leaving these files out of the backup is simply not an option.

To cover this requirement, Microsoft took a page out of LVM snapshots and introduced with Windows XP the Volume Shadow Copy services. In plain words, they developed the necessary drivers and services that allow a process to take a "frozen picture" of the filesystem, and use that frozen picture for whatever reason - backup applications being the primary clients of this feature. To cope with the fact that some applications would not tolerate the inconsistent state of the files when snapshot, the Volume Shadow services include the necessary work-arounds: asking the appropriate applications to do a sort of "commit", basically, before actually taking the snapshot.

Unfortunately, the Volume Shadow Copy left again something to be desired: there is no way for normal processes to access these "shadow" volumes, since they are not visible via normal drive letters (they are low-level devices, e.g. \\?\Volume{785cc4a6-3d68-11d7-9cc5-505054503030}). Thanks to Adi Oltean, however, there is a method that involves using Microsoft tools (vshadow and dosdev) that allow us to create these snapshots and give them a normal drive letter - after which, we can use the usual rsync-based snapshots to back everything up!

Rsync under Windows

Older versions of Cygwin (e.g. 1.5) had some issues: they couldn't handle UTF-8 filenames correctly, and also had issues with very long filenames/paths. Cygwin 1.7 fixed these issues for all applications.

Permissions

When backing up our files, we have to decide what to do with their permissions: we can of course invoke rsync with the "-a" parameter, to try and save as much of them as possible, but this isn't necessarily a wise thing: when we "rotate" the backup directories, we first remove the oldest one - but if we save the original permissions of the files, we won't be able to remove some (e.g. the Windows system folders, marked as read-only) - which will break the backup process. For my needs (I backup to an external USB drive that I alone can access), I invoke rsync with the --chmod=ugo=rwX which basically makes everything accessible for everyone.

The complete solution

As with all problems, there is no magic bullet that covers everyone's requirements. And this is where the UNIX philosophy shines: understanding the simple tools that do one job - and do it well - and then "glueing" them together with scripting to cover our specific needs.

In this case, I'll show you how I use an external USB drive to perform a daily backup of my Windows machine at work. The process however can be modified in many ways: e.g. to backup to a network share (an OpenSolaris/ZFS Samba share would be perfect: just remember to rsync --inplace and snapshot the results via ZFS snapshots) or to create directories named after the day of the backup, etc. What follows is a very simple - yet fully functional - usage scenario of the appropriate tools.

Download my Windows XP backup package (I used the open source 7-zip archiver, which compresses much better than anything else right now). Uncompress it in e.g. c:\Backup, and let's have a look inside:

C:\> cd Backup
C:\Backup> dir
 Volume in drive C has no label.
 Volume Serial Number is XXXX-XXXX

 Directory of C:\Backup

15/04/2009  06:10 pm    <DIR>          .
15/04/2009  06:10 pm    <DIR>          ..
24/04/2008  12:51 pm    1.157.632 cygcrypto-0.9.8.dll \
23/10/2006  01:44 am      999.424 cygiconv-2.dll      | 
20/11/2005  04:13 am       31.744 cygintl-3.dll       | Cygwin DLLs
23/10/2006  02:23 am       31.744 cygintl-8.dll       |=>  for
09/06/2002  07:50 am       22.528 cygpopt-0.dll       |  rsync.exe
22/05/2008  09:02 pm    2.329.849 cygwin1.dll         | 
16/10/2006  03:10 am       66.048 cygz.dll           /
28/09/2004  02:07 pm        6.656 dosdev.exe      => MS tool
15/04/2009  01:52 pm           62 mybackup.cmd
23/05/2008  09:52 pm      915.896 rsync.exe       => Cygwin tool
01/11/2006  02:05 pm      150.328 sync.exe        => MS tool
08/06/2005  03:17 pm      294.912 vshadow.exe     => MS tool
08/06/2005  03:17 pm      352.256 vshadow2003andMaybeVista.exe => MS
15/04/2009  12:39 pm        1.219 vss-exec.cmd
              18 File(s)      6.639.134 bytes
               2 Dir(s)  80.913.649.664 bytes free
So we have a set of Microsoft and Cygwin tools, and two scripts. The backup process starts with mybackup.cmd:
C:\Backup> type mybackup.cmd
@echo off
echo Creating backup directories on F:\Backups if missing
if not exist F:\Backups mkdir F:\Backups
for %%p in (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) do if 
  not exist F:\Backups\%%p mkdir F:\Backups\%%p
sync
vshadow.exe -script=vss-setvar.cmd -exec=vss-exec.cmd C:
First, we make sure the backup directories exist. We then invoke the sync command from Microsoft Sysinternals, which flushes all filesystem buffers to the disks (just in case something goes bad - Windows do have blue screens, you know :‑)) We then invoke vshadow.exe to create a shadow volume copy of the C: drive (if you are backing up a different drive, change this).
If you don't use Windows XP? If you have Windows 2003 or Vista, you must use vshadow2003andMaybeVista.exe instead. I personally don't use Vista (and know no self-respecting sysadmin that does, either) so feel free to experiment and report any findinds...
vshadow will create a vss-setvar.cmd that sets helpful environment variables relating to our "shadow" volume, and will then invoke our vss-exec.cmd. Here it is:
C:\Backup>type vss-exec.cmd
call vss-setvar.cmd
@echo off
dosdev B: %SHADOW_DEVICE_1%
echo Removing oldest snapshot...
rmdir /S /Q F:\Backups\15
echo Rolling histories one snapshot ahead...
rename F:\Backups\14 15
rename F:\Backups\13 14
rename F:\Backups\12 13
rename F:\Backups\11 12
rename F:\Backups\10 11
rename F:\Backups\9 10
rename F:\Backups\8 9
rename F:\Backups\7 8
rename F:\Backups\6 7
rename F:\Backups\5 6
rename F:\Backups\4 5
rename F:\Backups\3 4
rename F:\Backups\2 3
rename F:\Backups\1 2
rename F:\Backups\0 1
rsync -rtDvx --chmod=ugo=rwX --delete --link-dest=/cygdrive/f/Backups/1 
  /cygdrive/b/ /cygdrive/f/Backups/0/
dosdev -r -d B:
Don't go blindly executing this, let's see it first, step by step: That's it.

The only remaining piece in the puzzle is the automatic invocation of mybackup.cmd at a convenient time. We can use the Windows Scheduler service for this:

C:\Backup> schtasks /Create /SC weekly /D MON,TUE,WED,THU,FRI 
  /TN MyDailyBackup  /ST 23:30:00 /TR c:\Backup\mybackup.cmd 
  /RU SEMANTIX\ttsiodras /RP mypassword
The /RU and /RP options are there to specify the account under which the backup will take place. Make sure you use an account with backup privileges for this (the Administrator account will of course work just fine - but it's not a good policy, security-wise). With the invocation above, the machine will be automatically backed-up every weekday night at 11:30pm. If you want to check that this works without waiting for the middle of the night, do your first backup (which will take more time since it has to copy all the data - the following backups will be very fast) right now:
C:\Backup> schtasks /Create /SC Once /TN MyFirstBackup 
  /ST 14:10:00 /TR c:\Backup\mybackup.cmd 
  /RU SEMANTIX\ttsiodras /RP mypassword

(Change 14:10:00 to one/two minutes ahead of your current time)
I hope you'll find this process as useful as I have... It is simple to understand and easy to execute (even for newbies - just change the drive letters to the ones used in your PC).

P.S. And for those of you that want a taste of things to come: the rsync process is forced to make a copy of the files that have changed - so if for example you use VMWARE images (which come with huge .vmdk files), any change inside them (even one little sector worth of data) will force a complete copy... and waste a lot of space. Copy-on-write filesystems like ZFS (and soon, btrfs) are incredibly more efficient: If you run the rsync daemon on one of them, you can use rsync with the --inplace option, and then use the filesystem's snapshotting mechanisms after rsync completes - which will only reserve space for the storage blocks that actually changed! If you have an OpenSolaris/ZFS server, you can already use this to backup your machines - with such incredible storage gains, that for all intents and purposes, you can enjoy almost unlimited daily backups.


profile for ttsiodras at Stack Overflow, Q&A for professional and enthusiast programmers
GitHub member ttsiodras
 
Index
 
 
CV
 
 
Updated: Sat Oct 8 12:33:59 2022
 

The comments on this website require the use of JavaScript. Perhaps your browser isn't JavaScript capable; or the script is not being run for another reason. If you're interested in reading the comments or leaving a comment behind please try again with a different browser or from a different connection.