scrounge.org

Use rsync to back up a directory tree of files

rsync is a very good program for backing up/mirroring a directory tree of files from one machine to another machine, and for keeping the two machines "in sync." More on rsync features.

rsync fits in great with the Scrounge.org philosophy of having lots of cheap machines. Now you have a way of keeping your backup machines syncronized with your "main" machine.

Installation

Download and install rsync. The easy way is if one of the binary packages work. If you can use a Red Hat 6.0 RPM, then download the most current version from here. It is currently at version 2.4.5-1, so you would download rsync-2.4.5-1.i386.rpm. Then (as root) type:

rpm -Uvh rsync-2.4.5-1.i386.rpm

and it is installed. Substitute as is appropriate for the most current version number.

If you can't use the RPM file, or any of the other binary distributions, then you must download the source "tarball" file, untar it, and follow the instruction that are contained in the README file to compile and install it.

When you have it installed, type rsync --help to see if it is alive. It should display several screens of options. You must install rsync on all machines that you will be connecting to.

Configuring and testing the SSH connection

Warning! There are security implications with configuring SSH and rsync to allow "auto-login" with no password prompts. Make sure that you know what you are doing when configuring SSH, especially if you allow remote users to log into any of your machines.

Please read these additional thoughts on security when using rsync.

SSH is the preferred method of connecting with rsync (IMO.) If you haven't already installed SSH on your machines, then see How to install and configure SSH. Before you can connect with rsync, using SSH as the transport layer, you must be able to slogin to the other host. So first try to log into the other machine by typing slogin hostname (where hostname is the name of the computer you are connecting to.) Press Ctrl-D to log out.

If you want rsync to connect with auto-login (with no password prompt!), so that you can use rsync in an unattended script, you must get RSA keys working by following the procedures explained at the Getting started with SSH page.

If for some reason you don't want to (or can't) use SSH, then you must use the native RSH transport layer. In this case, you must be able to connect with rlogin (instead of SSH's slogin.) See man rsh, man rlogin and maybe man rcp. Remove all instances of --rsh=ssh from the OPT definitions in the script example, below.

rsync reference

A simple rsync script

Copy and paste the script into a text file. Look through it and change variable definitions, as needed. Save and name it to be something like rsync_demo.sh. Change the permission bits so that it is executable. (chmod 700 rsync_demo.sh) Create the excludes file. (See script for explanation.) Run rsync_demo.sh by typing ./rsync_demo.sh


#!/bin/sh

# Simple rsync "driver" script.  (Uses SSH as the transport layer.)
# http://www.scrounge.org/linux/rsync.html

# Demonstrates how to use rsync to back up a directory tree from a local
# machine to a remote machine.  Then re-run the script, as needed, to keep
# the two machines "in sync."  It only copies new or changed files and ignores
# identical files.

# Destination host machine name
DEST="smpent"

# User that rsync will connect as
# Are you sure that you want to run as root, though?
USER="root"

# Directory to copy from on the source machine.
BACKDIR="/root/bin/"

# Directory to copy to on the destination machine.
DESTDIR="/root/bin/"

# excludes file - Contains wildcard patterns of files to exclude.
# i.e., *~, *.bak, etc.  One "pattern" per line.
# You must create this file.
# EXCLUDES=/root/bin/excludes

# Options.
# -n Don't do any copying, but display what rsync *would* copy. For testing.
# -a Archive. Mainly propogate file permissions, ownership, timestamp, etc.
# -u Update. Don't copy file if file on destination is newer.
# -v Verbose -vv More verbose. -vvv Even more verbose.
# See man rsync for other options.

# For testing.  Only displays what rsync *would* do and does no actual copying.
OPTS="-n -vv -u -a --rsh=ssh --exclude-from=$EXCLUDES --stats --progress"
# Does copy, but still gives a verbose display of what it is doing
#OPTS="-v -u -a --rsh=ssh --exclude-from=$EXCLUDES --stats"
# Copies and does no display at all.
#OPTS="--archive --update --rsh=ssh --exclude-from=$EXCLUDES --quiet"

# May be needed if run by cron?
export PATH=$PATH:/bin:/usr/bin:/usr/local/bin

# Only run rsync if $DEST responds.
VAR=`ping -s 1 -c 1 $DEST > /dev/null; echo $?`
if [ $VAR -eq 0 ]; then
    rsync $OPTS $BACKDIR $USER@$DEST:$DESTDIR
else
    echo "Cannot connect to $DEST."
fi


Note. rsync doesn't (by default) actually copy whole files between machines. Rather, it uses the rsync algorithm to find the differences between the two files and only sends sufficient information that is needed to make the destination file be identical with the source file. This is much more complicated than just copying the file, but has the potential for drastically minimizing the amount of data that has to be copied.


Thanks to Brian, Eric, and Johannes Ullrich for their help in preparing this page.

Comments and corrections to me.

Back to the main Scrounge page.