Keeping Remote Files In Sync

July 2014 | by David Tansley

Remote Files In Sync

When you need to keep local files or directories in sync with a remote host, rsync utility ticks all the boxes. Rsync is the tool of choice when setting up and maintaining a warm disaster-recovery (DR) box, so when you need to cut over, you can be sure that the application file systems are up-to-date on the remote host.

In this article, I won’t show you how to use rsync per se; rather I’ll show you how to use rsync to keep like-for-like files on a remote host. Why would you use rsync to copy files? For starters, it’s quicker than SCP, tar or cp. Once the initial transfer completes, when re-executed, rsync will only transfer the differences in files. SCP, tar or cp cannot do this. If a copy gets suspended or halted, when you re-issue the command, you have the capability to resume and let rsync transfer any partial files that were not copied across due to the halt or outage on the network.

When using rsync, the transport method is typically carried out by user root. In normal circumstances, this would mean that root’s SSH keys have been exchanged from the remote server, so you get no password prompting when executing rsync. Once rsync has copied files across to the remote host, when it is executed again with the same source/destinations paths, it will trickle feed any changes in the files, so you don’t have to redo the whole copy operation. It will look for any file changes and transfer only those that aren’t on the remote host.

Of course, if you really want a true synced remote host, it would be advantageous to remove any files that are present on the remote host but don’t exist on the local host. That may seem confusing at first, but you don’t want extra files residing on the remote hosts that don’t exist on the local host. You may get caught with old configuration files to an app when you fire up the DR box application.

The Basics and the Slash

To use rsync for a like-for-like copy, first determine what directories you want copied. Let’s assume I have the following structure on the local host:

/opt/webspshere

I want those directory contents copied to the remote host:

 bravo

So I could use the following:

rsync –azv /opt/webspshere/  bravo:/opt/websphere

Note we have a trailing slash on the source directory but not on the destination directory. This means to copy the contents and below of the source directory. If you want a like-for-like copy, don’t use an ending trailing slash on the destination directory. If you do, you will have the source directory structure copied across into /opt/webspshere on the remote side, so you could end up with: /opt/websphere/opt/websphere, which is probably not what you want.

The options parsed to rsync in my example are:

a	Archive mode – copy and preserve files attributes, symbolic links
z	Compress as we copy
v	Verbose the output

If you’re executing rsync as a different user, you can specify the user with the rsync parameters. For example, if I were transferring the files as user telly, one method I could use is:

rsync –azv  ssh telly@bravo:/opt/webspshere/  /opt/websphere

Do a Dry Run First

Before proceeding with rsync, I find it’s always best to do a dry run first, especially when copying data from a production host to a DR host. Use the ‘-n’ option to do a dry run and watch the output to screen, then confirm what’s being copied is what you want. If you’re satisfied all is correct, fire off the rsync command without the ‘-n’.

Log and Display

Once an initial run of rsync has occurred and you’re happy with the results, it makes sense to run the rsync script from some sort of scheduler. However, having the rsync logged is also a good idea so you can review the contents of the transfer. Over time, you’ll see only the files that have changed on the local side getting transferred. Rsync does provide a logto option, but I prefer to use my own logging method.

Of particular interest is the progress meter on files. This can tell you how long certain files take, which is invaluable when determining when to run rsync while system is quiet. The progress meter is really only good for interactive use. Redirecting the output to a file provides a good indication of the progress of the transfer by displaying the percentage transferred, rather like the SCP command when transferring a file.

h	Display numbers n human readable format
process	Display progress of transfer

The following example logs all rsync operations success or failure to the log file /tmp/dr_bravo.log. It also presents the transfer in the log file in an easier-to-read format:

log=/tmp/dr_bravo.log
>$log
rsync –azvh - - progress /opt/webspshere/  bravo:/opt/websphere >>$log 2&1

Remove Files Not Present on the Local Host

I already recommended removing files present on remote host that aren’t present on the local host. Always use a dry run before invoking this option. Review the output, and then manually login and do a compare on the output to see it will remove files you expect. Once satisfied, fire off rsync with the delete option:

rsync –azv - - delete /opt/webspshere/  bravo:/opt/websphere

Resume the Transfer

If you experience a network outage, chances are good that the rsync transfer was halted and rsync will then remove any partially transferred files as this is its default action. You can prompt rsync to not remove the file, thus when rsync resumes the rest of the partial copied file will be transferred making the re-transfer of the completed file a lot quicker. For this, you can use:

rsync –azv - - partial /opt/webspshere/  bravo:/opt/websphere

However, I generally as rule don’t use this method. Rather, I just re-execute the rsync script (with no partial option). Even though this takes a few minutes longer, at least I can be sure the whole transfer has completed without worrying or checking for partially transferred files.

Up-to-Date Daily

You can use SCP cp or tar to copy files to a remote warm DR server, but nothing beats the flexibility of rsync. I use this utility for all our warm DR sites where I need to keep application and configuration directories up-to date-daily. I don’t know what I’d do without it.